Principled Preferential Bayesian Optimization

Wenjie Xu    Wenbin Wang    Yuning Jiang    Bratislav Svetozarevic    Colin N. Jones
Abstract

We study the problem of preferential Bayesian optimization (BO), where we aim to optimize a black-box function with only preference feedback over a pair of candidate solutions. Inspired by the likelihood ratio idea, we construct a confidence set of the black-box function using only the preference feedback. An optimistic algorithm with an efficient computational method is then developed to solve the problem, which enjoys an information-theoretic bound on the total cumulative regret, a first-of-its-kind for preferential BO. This bound further allows us to design a scheme to report an estimated best solution, with a guaranteed convergence rate. Experimental results on sampled instances from Gaussian processes, standard test functions, and a thermal comfort optimization problem all show that our method stably achieves better or competitive performance as compared to the existing state-of-the-art heuristics, which, however, do not have theoretical guarantees on regret bounds or convergence.

Machine Learning, ICML
\NewEnviron

resizealigned \BODY\BODY\displaystyle\BODY (1)


1 Introduction

Bayesian optimization (BO) is a popular sample-efficient black-box optimization method (Shahriari et al., 2015; Frazier, 2018). It is widely applied to tuning hyperparameters of machine learning models (Snoek et al., 2012), optimizing the performance of control systems (Xu et al., 2022b), and discovering new drugs (Negoescu et al., 2011), etc.

The main idea of BO is based on surrogate modeling. That is, a learning algorithm (typically Gaussian process regression) is applied to learn the unknown black-box function using historical samples, which then outputs a learned surrogate together with uncertainty quantification. Then BO algorithms, such as the popular Expected Improvement (Jones et al., 1998) and GP-UCB algorithms (Srinivas et al., 2012), use the information of this learned surrogate and uncertainty quantification to choose the next sample point.

The conventional BO setting assumes each sample, which typically corresponds to a round of real-world experiment or software simulation in practice, returns a noisy scalar evaluation of the black-box function. However, many human-in-the-loop systems can not return such a scalar value, or it is much more difficult to directly obtain such a scalar evaluation from humans since humans are bad at sensing absolute magnitude (Kahneman & Tversky, 2013). In contrast, it is much easier for a human to compare a pair of solutions and report which is preferred (Lichtenstein & Slovic, 1971; Tversky & Kahneman, 1974; Kahneman & Tversky, 2013).

This gives rise to preferential Bayesian optimization (González et al., 2017), where the scalar evaluation of the black-box function is not available. But rather, we can query an oracle to compare a pair of solutions, or the so-called duels. Such settings arise widely in a broad range of applications, such as visual design optimization (Koyama et al., 2020), thermal comfort optimization (Abdelrahman & Miller, 2022) and robotic gait optimization (Li et al., 2021).

Existing preferential Bayesian optimization methods are mostly heuristic, without formal guarantees on cumulative regret or convergence to the global optimal solution. For example, (González et al., 2017) proposes several heuristic acquisition strategies, including expected improvement and Thompson sampling-based methods, for preferential Bayesian optimization. (Mikkola et al., 2020) extends the preferential Bayesian optimization to the projective setting. (Takeno et al., 2023) proposes a Thompson sampling-based method for practical preferential Bayesian optimization with skew Gaussian process. (Astudillo et al., 2023) proposes a decision theoretical acquisition strategy with a convergence rate guarantee for a finite input set. However, as far as we know, all the existing preferential Bayesian optimization methods can not provide theoretical guarantees on cumulative regret or global convergence with continuous input space, partially due to the challenge of quantifying uncertainty in a principled way.

Beyond preferential BO, optimization from preference feedback has also been investigated in other contexts. In the following, we first survey the related work other than preferential BO and then highlight our unique contributions.

Dueling Bandits In dueling bandits (Yue et al., 2012), the goal is to identify the best arm from a set of finite arms, using only the noisy comparison feedback. It has also been extended to adversarial (Gajane et al., 2015) and contextual (Dudík et al., 2015; Saha & Krishnamurthy, 2022) settings. One extension that is most related to this work is kernelized dueling bandits (Sui et al., 2017, 2018). However, this line of research is typically restricted to the case where the number of arms is finite, and the regret bound can blow up to infinity when the number of arms goes to infinity (e.g., Thm. 2 in (Sui et al., 2017)). A recent work (Mehta et al., 2023) proposes an offline method with suboptimality bound by learning winning probability, which, however, are not applicable to online learning problems due to linear growth of regret over the randomly sampled compared point sequences. In the existing literature, there is no cumulative regret bound that depends on an inherent complexity metric (such as covering number and maximum information gain (Srinivas et al., 2012)) of the black-box function with continuous input space.

Convex Optimization with Preference Feedback (Saha et al., 2021; Yue & Joachims, 2009) consider the optimization of convex functions, where only a comparison oracle of function values over different points is available. The proposed methods estimate the gradient from the preference signals. However, this line of research restricts the function to be convex, while in practice, the black-box function may be non-convex. The proposed method may get stuck in a local optimum and can be sample-inefficient since each estimate of the gradient already needs several samples.

Reinforcement Learning from Human Feedback Reinforcement learning from human feedback (RLHF) (Christiano et al., 2017; Griffith et al., 2013) has recently become very popular. It has found many successes in wide applications, including training robots (Hiranaka et al., 2023), playing games (Warnell et al., 2018), and remarkably large language models (Ouyang et al., 2022). On the theoretical line of RLHF research, recent results analyze the offline learning of the implicit reward function (Zhu et al., 2023) and the model-based optimistic reinforcement learning from human feedback (Wang et al., 2023). However, the existing theoretical analysis either only deals with finite-dimensional generalized linear models or highly relies on the complexity measure of Eluder dimension (Osband & Van Roy, 2014). The existing generic theoretical analysis for RLHF can not be directly applied to the Bayesian optimization setting, where the Eluder dimension of the infinite-dimensional reproducing kernel Hilbert space is not well understood.

Optimistic Model-based Sequential Decision Making Optimism in the face of uncertainty is a widely adopted design principle for model-based sequential decision making problems, such as in Bayesian optimization/reinforcement learning (Wu et al., 2022; Xu et al., 2023; Pacchiano et al., 2021; Curi et al., 2020; Liu et al., 2023). The optimism principle has also been applied to RLHF (Wang et al., 2023) recently. However, as far as we know, there is no existing principled optimistic algorithm for preferential BO yet.

Our contributions. Guided by the optimism principle, we design a preferential Bayesian optimization algorithm that enjoys information-theoretic bounds on the cumulative regret. Specifically, our contributions include:

  • Algorithm design. Inspired by the recent work of the confidence set based on optimistic maximum likelihood estimate (Liu et al., 2023) and the likelihood ratio confidence set idea (Owen, 1990; Emmenegger et al., 2023), we construct a confidence set by only using the preference feedback. We then exploit the principle of optimism in the face of uncertainty to design a Principled Optimistic Preferential Bayesian Optimization (POP-BO) algorithm, together with a scheme of reporting an estimated best solution.

  • Theoretical analysis. Under some mild regularity assumptions, we prove an information-theoretic bound on the cumulative regret of POP-BO algorithm, which is first-of-its-kind 111(Mehta et al., 2023) provides a bound on the partial cumulative regret, which only captures the suboptimality of one point in each compared duel. We consider stronger total cumulative regret over both points in the compared duel. See Appendix Q for a detailed discussion. for preferential Bayesian optimization. This is significant since previous information-theoretic regret bounds typically assume the direct scalar evaluations of black-box functions (Srinivas et al., 2012) while the recent generic theoretical results for RLHF typically rely on Eluder dimension, which is not well understood for RKHS.

  • Efficient computations. The optimistic algorithm needs to solve bi-level optimization problems with the inner variable in an infinite-dimensional function space. We leverage the representer theorem (Schölkopf et al., 2001) to reduce the inner optimization problem to finite-dimensional space, which turns out to be tractable via convex optimization. This further allows efficient grid-free joint optimization.

  • Empirical validations and toolbox. 222Code link: https://github.com/PREDICT-EPFL/POP-BO Experimental results show that POP-BO consistently achieves better or competitive performance as compared to the state-of-the-art heuristic baselines and more than 10101010 times speed-up in computation as compared to the Thompson sampling based method. We also provide a reusable toolbox for future applications of our method.

2 Problem Statement

We consider the maximization of a black-box function f𝑓fitalic_f,

maxx𝒳f(x),subscript𝑥𝒳𝑓𝑥\max_{x\in\mathcal{X}}\;f(x),roman_max start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT italic_f ( italic_x ) , (2)

where 𝒳d𝒳superscript𝑑\mathcal{X}\subset\mathbb{R}^{d}caligraphic_X ⊂ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT with d𝑑ditalic_d as the input dimension. We use xxsucceeds𝑥superscript𝑥x\succ x^{\prime}italic_x ≻ italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT to denote the event that ‘x𝑥xitalic_x is preferred to xsuperscript𝑥x^{\prime}italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT’. In contrast to the standard BO setup, we assume that we can not directly evaluate the scalar value of f(x)𝑓𝑥f(x)italic_f ( italic_x ) but rather, we have a comparison oracle that compares any two points x,x𝑥superscript𝑥x,x^{\prime}italic_x , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and returns a preference signal 𝟏xxsubscript1succeeds𝑥superscript𝑥\mathbf{1}_{x\succ x^{\prime}}bold_1 start_POSTSUBSCRIPT italic_x ≻ italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT, which is defined as

𝟏xx={1, if x is preferred,0, if x is preferred.subscript1succeeds𝑥superscript𝑥cases1 if 𝑥 is preferred,0 if superscript𝑥 is preferred.\mathbf{1}_{x\succ x^{\prime}}=\begin{cases}1,&\text{ if }x\text{ is preferred% ,}\\ 0,&\text{ if }x^{\prime}\text{ is preferred.}\end{cases}bold_1 start_POSTSUBSCRIPT italic_x ≻ italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = { start_ROW start_CELL 1 , end_CELL start_CELL if italic_x is preferred, end_CELL end_ROW start_ROW start_CELL 0 , end_CELL start_CELL if italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is preferred. end_CELL end_ROW (3)

Before proceeding, we state a set of common assumptions.

Assumption 2.1.

𝒳𝒳\mathcal{X}caligraphic_X is compact and nonempty.

Assumption 2.1 is reasonable because, in many applications (e.g., continuous hyperparameter tuning) of Bayesian Optimization, we are able to restrict the optimization into certain ranges based on domain knowledge. Regarding the black-box function f𝑓fitalic_f, we assume that,

Assumption 2.2.

fk𝑓subscript𝑘f\in\mathcal{H}_{k}italic_f ∈ caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, where k:d×d:𝑘superscript𝑑superscript𝑑k:\mathbb{R}^{d}\times\mathbb{R}^{d}\to\mathbb{R}italic_k : blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT × blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT → blackboard_R is a symmetric, positive semidefinite kernel function and ksubscript𝑘\mathcal{H}_{k}caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is the corresponding reproducing kernel Hilbert space (RKHS, see (Schölkopf et al., 2001)). Furthermore, we assume fkBsubscriptnorm𝑓𝑘𝐵\|f\|_{k}\leq B∥ italic_f ∥ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≤ italic_B, where k\|\cdot\|_{k}∥ ⋅ ∥ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is the norm induced by the inner product in the corresponding RKHS.

Assumption 2.2 requires that the function to be optimized is regular in the sense that it has a bounded norm in the RKHS, which is a common assumption (Chowdhury & Gopalan, 2017a; Zhou & Ji, 2022). For simplicity, we will use fsubscript𝑓\mathcal{B}_{f}caligraphic_B start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT to denote the set {f~k|f~kB}conditional-set~𝑓subscript𝑘subscriptnorm~𝑓𝑘𝐵\left\{\tilde{f}\in\mathcal{H}_{k}|\|{\tilde{f}}\|_{k}\leq B\right\}{ over~ start_ARG italic_f end_ARG ∈ caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | ∥ over~ start_ARG italic_f end_ARG ∥ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≤ italic_B }, which is a ball with radius B𝐵Bitalic_B in ksubscript𝑘\mathcal{H}_{k}caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT.

Remark 2.3 (Choice of B𝐵Bitalic_B).

In practice, a tight norm bound B𝐵Bitalic_B might not be known beforehand. In the theoretical analysis, we only assume that there is a finite bound B𝐵Bitalic_B, possibly unknown beforehand. In the practical implementation of our algorithm, we can adapt B𝐵Bitalic_B based on hypothesis testing (Newey & McFadden, 1994). For example, we can double B𝐵Bitalic_B every time we detect a low likelihood value (See more elaboration in Appendix G.).

Assumption 2.4.

k(x,x)1,x,x𝒳formulae-sequence𝑘𝑥superscript𝑥1for-all𝑥superscript𝑥𝒳k(x,x^{\prime})\leq 1,\forall x,x^{\prime}\in\mathcal{X}italic_k ( italic_x , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ≤ 1 , ∀ italic_x , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_X and k(x,x)𝑘𝑥superscript𝑥k(x,x^{\prime})italic_k ( italic_x , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) is continuous on d×dsuperscript𝑑superscript𝑑\mathbb{R}^{d}\times\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT × blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT.

Assumption 2.4 is a commonly adopted mild assumption in the BO literature (Srinivas et al., 2012; Chowdhury & Gopalan, 2017a). It holds for most commonly used kernel functions after normalization, such as the linear kernel, the Matérn kernel, and the squared exponential kernel.

Assumption 2.5.

The random preference feedback 𝟏xxsubscript1succeeds𝑥superscript𝑥\mathbf{1}_{x\succ x^{\prime}}bold_1 start_POSTSUBSCRIPT italic_x ≻ italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT from the comparison oracle follows the Bernoulli distribution with (𝟏xx=1)=pxx=σ(yy)subscript1succeeds𝑥superscript𝑥1subscript𝑝succeeds𝑥superscript𝑥𝜎𝑦superscript𝑦\mathbb{P}(\mathbf{1}_{x\succ x^{\prime}}=1)=p_{x\succ x^{\prime}}=\sigma(y-y^% {\prime})blackboard_P ( bold_1 start_POSTSUBSCRIPT italic_x ≻ italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = 1 ) = italic_p start_POSTSUBSCRIPT italic_x ≻ italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = italic_σ ( italic_y - italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ), where y=f(x)𝑦𝑓𝑥y=f(x)italic_y = italic_f ( italic_x ), y=f(x)superscript𝑦𝑓superscript𝑥y^{\prime}=f(x^{\prime})italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_f ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) and σ(u)=1/(1+eu)𝜎𝑢11superscript𝑒𝑢\sigma(u)=\nicefrac{{1}}{{(1+e^{-u})}}italic_σ ( italic_u ) = / start_ARG 1 end_ARG start_ARG ( 1 + italic_e start_POSTSUPERSCRIPT - italic_u end_POSTSUPERSCRIPT ) end_ARG.

Assumption 2.5 equivalently assumes that,

(𝟏xx=1)=ef(x)ef(x)+ef(x),subscript1succeeds𝑥superscript𝑥1superscript𝑒𝑓𝑥superscript𝑒𝑓𝑥superscript𝑒𝑓superscript𝑥\mathbb{P}(\mathbf{1}_{x\succ x^{\prime}}=1)=\frac{e^{f(x)}}{e^{f(x)}+e^{f(x^{% \prime})}},blackboard_P ( bold_1 start_POSTSUBSCRIPT italic_x ≻ italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = 1 ) = divide start_ARG italic_e start_POSTSUPERSCRIPT italic_f ( italic_x ) end_POSTSUPERSCRIPT end_ARG start_ARG italic_e start_POSTSUPERSCRIPT italic_f ( italic_x ) end_POSTSUPERSCRIPT + italic_e start_POSTSUPERSCRIPT italic_f ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_POSTSUPERSCRIPT end_ARG , (4)

which can be observed to be the widely used Bradley-Terry-Luce (BTL) model (Bradley & Terry, 1952) for pairwise comparison. The intuition here is that the more advantage f(x)𝑓𝑥f(x)italic_f ( italic_x ) has as compared to f(x)𝑓superscript𝑥f(x^{\prime})italic_f ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ), the more likely x𝑥xitalic_x is preferred. The same comparison model is also used in, e.g., training large language models (Ouyang et al., 2022). At step t𝑡titalic_t, our algorithm queries the pair (xt,xt)subscript𝑥𝑡superscriptsubscript𝑥𝑡(x_{t},x_{t}^{\prime})( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) and the comparison oracle returns the random preference 𝟏xtxt{0,1}subscript1succeedssubscript𝑥𝑡superscriptsubscript𝑥𝑡01\mathbf{1}_{x_{t}\succ x_{t}^{\prime}}\in\{0,1\}bold_1 start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≻ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∈ { 0 , 1 }. For the simplicity of notation, we use 𝟏τ{0,1}subscript1𝜏01\mathbf{1}_{\tau}\in\{0,1\}bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ∈ { 0 , 1 } to denote the realization of the Bernoulli random variable 𝟏xτxτsubscript1succeedssubscript𝑥𝜏superscriptsubscript𝑥𝜏\mathbf{1}_{x_{\tau}\succ x_{\tau}^{\prime}}bold_1 start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ≻ italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT when querying the comparison oracle at step τ𝜏\tauitalic_τ. Based on the historical comparison results

𝒟t:={(xτ,xτ,𝟏τ)}τ=1t,assignsubscript𝒟𝑡superscriptsubscriptsubscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏𝜏1𝑡\mathcal{D}_{t}\vcentcolon=\{(x_{\tau},x_{\tau}^{\prime},\mathbf{1}_{\tau})\}_% {\tau=1}^{t},caligraphic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT := { ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , (5)

the algorithm needs to decide the next pair of samples to compare. Without further notice, all the theoretical results in this paper are under the assumptions 2.12.22.42.5, and all the corresponding proofs are in the appendices.

3 High Confidence Set

Notations. The probability, denoted as ()\mathbb{P}(\cdot)blackboard_P ( ⋅ ), is taken over the randomness of the preference feedback generated by the comparison oracle and the randomness generated by the algorithm. Let the filtration tsubscript𝑡\mathcal{F}_{t}caligraphic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT capture all the randomness up to step t𝑡titalic_t. 𝒩(f,ϵ,)\mathcal{N}(\mathcal{B}_{f},\epsilon,\|\cdot\|_{\infty})caligraphic_N ( caligraphic_B start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT , italic_ϵ , ∥ ⋅ ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ) denotes the standard covering number (Zhou, 2002) of the function space ball fsubscript𝑓\mathcal{B}_{f}caligraphic_B start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT with the covering balls’ radius ϵitalic-ϵ\epsilonitalic_ϵ and the infinity norm \|\cdot\|_{\infty}∥ ⋅ ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT. We will also use [τ]delimited-[]𝜏[\tau][ italic_τ ] to denote the set {1,,τ}1𝜏\{1,\cdots,\tau\}{ 1 , ⋯ , italic_τ }.

3.1 Likelihood-based Confidence Set

We first introduce the function,

pf^(xτ,xτ,𝟏τ):=assignsubscript𝑝^𝑓subscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏absent\displaystyle p_{\hat{f}}(x_{\tau},x_{\tau}^{\prime},\mathbf{1}_{\tau})\vcentcolon=italic_p start_POSTSUBSCRIPT over^ start_ARG italic_f end_ARG end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) := 𝟏τσ(f^(xτ)f^(xτ))+limit-fromsubscript1𝜏𝜎^𝑓subscript𝑥𝜏^𝑓superscriptsubscript𝑥𝜏\displaystyle\mathbf{1}_{\tau}\sigma(\hat{f}(x_{\tau})-\hat{f}(x_{\tau}^{% \prime}))+bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT italic_σ ( over^ start_ARG italic_f end_ARG ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) - over^ start_ARG italic_f end_ARG ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) + (6)
(1𝟏τ)(1σ(f^(xτ)f^(xτ))),1subscript1𝜏1𝜎^𝑓subscript𝑥𝜏^𝑓superscriptsubscript𝑥𝜏\displaystyle(1-\mathbf{1}_{\tau})\left(1-\sigma(\hat{f}(x_{\tau})-\hat{f}(x_{% \tau}^{\prime}))\right),( 1 - bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) ( 1 - italic_σ ( over^ start_ARG italic_f end_ARG ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) - over^ start_ARG italic_f end_ARG ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) ) ,

which is the likelihood of f^^𝑓\hat{f}over^ start_ARG italic_f end_ARG over the event 𝟏xτxτ=𝟏τsubscript1succeedssubscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏\mathbf{1}_{x_{\tau}\succ x_{\tau}^{\prime}}=\mathbf{1}_{\tau}bold_1 start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ≻ italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT under the Bernoulli preference model in Assumption 2.5.

We can then derive the likelihood function of a fixed function f^^𝑓\hat{f}over^ start_ARG italic_f end_ARG over the historical preference dataset 𝒟tsubscript𝒟𝑡\mathcal{D}_{t}caligraphic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT 333Note that f^()subscript^𝑓\mathbb{P}_{\hat{f}}(\cdot)blackboard_P start_POSTSUBSCRIPT over^ start_ARG italic_f end_ARG end_POSTSUBSCRIPT ( ⋅ ) is the likelihood function in f^^𝑓\hat{f}over^ start_ARG italic_f end_ARG over the historical data 𝒟tsubscript𝒟𝑡\mathcal{D}_{t}caligraphic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, not the probability taken over the data/algorithm randomness..

f^((xτ,xτ,𝟏τ)τ=1t):=τ=1tpf^(xτ,xτ,𝟏τ)assignsubscript^𝑓superscriptsubscriptsubscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏𝜏1𝑡superscriptsubscriptproduct𝜏1𝑡subscript𝑝^𝑓subscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏\mathbb{P}_{\hat{f}}((x_{\tau},x_{\tau}^{\prime},\mathbf{1}_{\tau})_{\tau=1}^{% t})\vcentcolon=\prod_{\tau=1}^{t}p_{\hat{f}}(x_{\tau},x_{\tau}^{\prime},% \mathbf{1}_{\tau})blackboard_P start_POSTSUBSCRIPT over^ start_ARG italic_f end_ARG end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) := ∏ start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT over^ start_ARG italic_f end_ARG end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) (7)

Taking log gives the log-likelihood function,

t(f^):=assignsubscript𝑡^𝑓absent\displaystyle\ell_{t}(\hat{f})\vcentcolon=roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_f end_ARG ) := logf^((xτ,xτ,𝟏τ)τ=1t)=τ=1tlogpf^(xτ,xτ,𝟏τ)subscript^𝑓superscriptsubscriptsubscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏𝜏1𝑡superscriptsubscript𝜏1𝑡subscript𝑝^𝑓subscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏\displaystyle\log\mathbb{P}_{\hat{f}}((x_{\tau},x_{\tau}^{\prime},\mathbf{1}_{% \tau})_{\tau=1}^{t})=\sum_{\tau=1}^{t}\log p_{\hat{f}}(x_{\tau},x_{\tau}^{% \prime},\mathbf{1}_{\tau})roman_log blackboard_P start_POSTSUBSCRIPT over^ start_ARG italic_f end_ARG end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT roman_log italic_p start_POSTSUBSCRIPT over^ start_ARG italic_f end_ARG end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT )
=\displaystyle== τ=1tlog(ezτ𝟏τ+ezτ(1𝟏τ)ezτ+ezτ)superscriptsubscript𝜏1𝑡superscript𝑒subscript𝑧𝜏subscript1𝜏superscript𝑒superscriptsubscript𝑧𝜏1subscript1𝜏superscript𝑒subscript𝑧𝜏superscript𝑒superscriptsubscript𝑧𝜏\displaystyle\sum_{\tau=1}^{t}\log\left(\frac{e^{z_{\tau}}\mathbf{1}_{\tau}+e^% {z_{\tau}^{\prime}}(1-\mathbf{1}_{\tau})}{e^{z_{\tau}}+e^{z_{\tau}^{\prime}}}\right)∑ start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT roman_log ( divide start_ARG italic_e start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT + italic_e start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( 1 - bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) end_ARG start_ARG italic_e start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT + italic_e start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT end_ARG ) (8)
=\displaystyle== τ=1t(zτ𝟏τ+zτ(1𝟏τ))τ=1tlog(ezτ+ezτ),superscriptsubscript𝜏1𝑡subscript𝑧𝜏subscript1𝜏superscriptsubscript𝑧𝜏1subscript1𝜏superscriptsubscript𝜏1𝑡superscript𝑒subscript𝑧𝜏superscript𝑒superscriptsubscript𝑧𝜏\displaystyle\sum_{\tau=1}^{t}\left(z_{\tau}\mathbf{1}_{\tau}+z_{\tau}^{\prime% }(1-\mathbf{1}_{\tau})\right)-\sum_{\tau=1}^{t}\log\left(e^{z_{\tau}}+e^{z_{% \tau}^{\prime}}\right),∑ start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT + italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( 1 - bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) ) - ∑ start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT roman_log ( italic_e start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT + italic_e start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ) ,

where zτ=f^(xτ),zτ=f^(xτ)formulae-sequencesubscript𝑧𝜏^𝑓subscript𝑥𝜏superscriptsubscript𝑧𝜏^𝑓superscriptsubscript𝑥𝜏z_{\tau}=\hat{f}(x_{\tau}),z_{\tau}^{\prime}=\hat{f}(x_{\tau}^{\prime})italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT = over^ start_ARG italic_f end_ARG ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) , italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = over^ start_ARG italic_f end_ARG ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ), 𝟏τ{0,1}subscript1𝜏01\mathbf{1}_{\tau}\in\{0,1\}bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ∈ { 0 , 1 } is the data realization of 𝟏xτxτsubscript1succeedssubscript𝑥𝜏superscriptsubscript𝑥𝜏\mathbf{1}_{x_{\tau}\succ x_{\tau}^{\prime}}bold_1 start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ≻ italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT, and the last equality can be checked correct for either 𝟏τ=1subscript1𝜏1\mathbf{1}_{\tau}=1bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT = 1 or 𝟏τ=0subscript1𝜏0\mathbf{1}_{\tau}=0bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT = 0.

A common method for statistical estimation is by maximizing the likelihood. Hence, we introduce the maximum likelihood estimator (MLE),

f^tMLEargmaxf~flogf~((xτ,xτ,𝟏τ)τ=1t).subscriptsuperscript^𝑓MLE𝑡subscript~𝑓subscript𝑓subscript~𝑓superscriptsubscriptsubscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏𝜏1𝑡\hat{f}^{\mathrm{MLE}}_{t}\in\arg\max_{\tilde{f}\in\mathcal{B}_{f}}\log\mathbb% {P}_{\tilde{f}}((x_{\tau},x_{\tau}^{\prime},\mathbf{1}_{\tau})_{\tau=1}^{t}).over^ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT roman_MLE end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ roman_arg roman_max start_POSTSUBSCRIPT over~ start_ARG italic_f end_ARG ∈ caligraphic_B start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_log blackboard_P start_POSTSUBSCRIPT over~ start_ARG italic_f end_ARG end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) . (9)

With the maximum likelihood estimator introduced, the posterior high confidence set can be derived as shown in Thm. 3.1 using the maximum log-likelihood value.

Theorem 3.1 (Likelihood-based Confidence Set).

ϵ,δ>0for-allitalic-ϵ𝛿0\forall\epsilon,\delta>0∀ italic_ϵ , italic_δ > 0, let,

ft+1:={f~f|t(f~)t(f^tMLE)β1(ϵ,δ,t)},assignsubscriptsuperscript𝑡1𝑓conditional-set~𝑓subscript𝑓subscript𝑡~𝑓subscript𝑡subscriptsuperscript^𝑓MLE𝑡subscript𝛽1italic-ϵ𝛿𝑡\mathcal{B}^{t+1}_{f}\vcentcolon=\{\tilde{f}\in\mathcal{B}_{f}|\ell_{t}(\tilde% {f})\geq\ell_{t}(\hat{f}^{\mathrm{MLE}}_{t})-\beta_{1}(\epsilon,\delta,t)\},caligraphic_B start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT := { over~ start_ARG italic_f end_ARG ∈ caligraphic_B start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT | roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over~ start_ARG italic_f end_ARG ) ≥ roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT roman_MLE end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_ϵ , italic_δ , italic_t ) } , (10)

where β1(ϵ,δ,t):=32tB2logπ2t2𝒩(f,ϵ,)6δ+CLϵt=𝒪(tlogt𝒩(f,ϵ,)δ+ϵt)\beta_{1}(\epsilon,\delta,t)\vcentcolon=\sqrt{32tB^{2}\log\frac{\pi^{2}t^{2}% \mathcal{N}(\mathcal{B}_{f},\epsilon,\|\cdot\|_{\infty})}{6\delta}}+C_{L}% \epsilon t=\mathcal{O}\left(\sqrt{t\log\frac{t\mathcal{N}(\mathcal{B}_{f},% \epsilon,\|\cdot\|_{\infty})}{\delta}}+\epsilon t\right)italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_ϵ , italic_δ , italic_t ) := square-root start_ARG 32 italic_t italic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log divide start_ARG italic_π start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT caligraphic_N ( caligraphic_B start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT , italic_ϵ , ∥ ⋅ ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ) end_ARG start_ARG 6 italic_δ end_ARG end_ARG + italic_C start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT italic_ϵ italic_t = caligraphic_O ( square-root start_ARG italic_t roman_log divide start_ARG italic_t caligraphic_N ( caligraphic_B start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT , italic_ϵ , ∥ ⋅ ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ) end_ARG start_ARG italic_δ end_ARG end_ARG + italic_ϵ italic_t ), with CLsubscript𝐶𝐿C_{L}italic_C start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT a constant independent of δ,t𝛿𝑡\delta,titalic_δ , italic_t and ϵitalic-ϵ\epsilonitalic_ϵ. We have,

(fft+1,t1)1δ.formulae-sequence𝑓subscriptsuperscript𝑡1𝑓for-all𝑡11𝛿\mathbb{P}\left(f\in\mathcal{B}^{t+1}_{f},\forall t\geq 1\right)\geq 1-\delta.blackboard_P ( italic_f ∈ caligraphic_B start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT , ∀ italic_t ≥ 1 ) ≥ 1 - italic_δ . (11)
Refer to caption
Figure 1: Demonstration of the maximum likelihood function and the confidence set based on likelihood. The results are derived using random sequential comparisons (that is, comparing xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT to xt1subscript𝑥𝑡1x_{t-1}italic_x start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT), where each xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is uniformly randomly sampled from the input set.

Intuitively, the confidence set ft+1superscriptsubscript𝑓𝑡1\mathcal{B}_{f}^{t+1}caligraphic_B start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT includes the functions with the log-likelihood value that is only ‘a little worse’ than the maximum likelihood estimator. It turns out that by correctly setting the ‘worse’ level β1subscript𝛽1\beta_{1}italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, the confidence set ft+1superscriptsubscript𝑓𝑡1\mathcal{B}_{f}^{t+1}caligraphic_B start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT contains the ground-truth function f𝑓fitalic_f with high probability. This is reasonable because the preference data is generated with the ground-truth function, and thus the likelihood of the ground-truth function will not be too much lower than the maximum likelihood estimator.

Remark 3.2 (Choice of ϵitalic-ϵ\epsilonitalic_ϵ).

In Thm. 3.1, β1(ϵ,δ,t)subscript𝛽1italic-ϵ𝛿𝑡\beta_{1}(\epsilon,\delta,t)italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_ϵ , italic_δ , italic_t ) also depends on a small positive value ϵitalic-ϵ\epsilonitalic_ϵ, which is to be chosen. In the theoretical analysis, it will be seen that ϵitalic-ϵ\epsilonitalic_ϵ can be selected to be 1/T1𝑇\nicefrac{{1}}{{T}}/ start_ARG 1 end_ARG start_ARG italic_T end_ARG, where T𝑇Titalic_T is the algorithm’s running horizon.

Remark 3.3 (Likelihood Ratio Idea).

The confidence set ft+1superscriptsubscript𝑓𝑡1\mathcal{B}_{f}^{t+1}caligraphic_B start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT contains the functions f~~𝑓\tilde{f}over~ start_ARG italic_f end_ARG that satisfy,

f~((xτ,xτ,𝟏τ)τ=1t)f^tMLE((xτ,xτ,𝟏τ)τ=1t)eβ1(ϵ,δ,t),subscript~𝑓superscriptsubscriptsubscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏𝜏1𝑡subscriptsubscriptsuperscript^𝑓MLE𝑡superscriptsubscriptsubscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏𝜏1𝑡superscript𝑒subscript𝛽1italic-ϵ𝛿𝑡\frac{\mathbb{P}_{\tilde{f}}((x_{\tau},x_{\tau}^{\prime},\mathbf{1}_{\tau})_{% \tau=1}^{t})}{\mathbb{P}_{\hat{f}^{\mathrm{MLE}}_{t}}((x_{\tau},x_{\tau}^{% \prime},\mathbf{1}_{\tau})_{\tau=1}^{t})}\geq e^{-\beta_{1}(\epsilon,\delta,t)},divide start_ARG blackboard_P start_POSTSUBSCRIPT over~ start_ARG italic_f end_ARG end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) end_ARG start_ARG blackboard_P start_POSTSUBSCRIPT over^ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT roman_MLE end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) end_ARG ≥ italic_e start_POSTSUPERSCRIPT - italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_ϵ , italic_δ , italic_t ) end_POSTSUPERSCRIPT , (12)

which is the likelihood ratio confidence set (Owen, 1990).

Remark 3.4.

Surrogate-based black-box optimization with kernel method is often referred to as Bayesian optimization due to its close relations to Bayesian Gaussian process model. Hence, we refer to our method as preferential BO.

Based on the confidence set in Thm. 3.1, we can derive the pointwise confidence range for the black-box function.

inff~ftf~(x)f(x)supf~ftf~(x).subscriptinfimum~𝑓subscriptsuperscript𝑡𝑓~𝑓𝑥𝑓𝑥subscriptsupremum~𝑓subscriptsuperscript𝑡𝑓~𝑓𝑥\inf_{\tilde{f}\in\mathcal{B}^{t}_{f}}\tilde{f}(x)\leq f(x)\leq\sup_{\tilde{f}% \in\mathcal{B}^{t}_{f}}\tilde{f}(x).roman_inf start_POSTSUBSCRIPT over~ start_ARG italic_f end_ARG ∈ caligraphic_B start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_POSTSUBSCRIPT over~ start_ARG italic_f end_ARG ( italic_x ) ≤ italic_f ( italic_x ) ≤ roman_sup start_POSTSUBSCRIPT over~ start_ARG italic_f end_ARG ∈ caligraphic_B start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_POSTSUBSCRIPT over~ start_ARG italic_f end_ARG ( italic_x ) . (13)

Fig. 1 demonstrates the maximum likelihood estimate function and the confidence range with the ground truth function sampled from a Gaussian process, random comparison inputs, and β1(ϵ,δ,t)subscript𝛽1italic-ϵ𝛿𝑡\beta_{1}(\epsilon,\delta,t)italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_ϵ , italic_δ , italic_t ) set to be a constant 1.01.01.01.0. It can be seen that the maximum likelihood estimate approximates the ground truth better and better with the confidence range shrinking, as we have more and more comparison data.

3.2 Bound Duel-wise Error

Thm. 3.1 gives a high confidence set based on the likelihood function. However, it is not straightforward how the likelihood bounds lead to the error bounds on function value differences over a compared pair (x,x)𝑥superscript𝑥(x,x^{\prime})( italic_x , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ), which determines the preference distribution. The following theorem further gives such a bound over the historical samples.

Lemma 3.5 (Elliptical Bound).

For any estimate f^t+1ft+1subscript^𝑓𝑡1superscriptsubscript𝑓𝑡1\hat{f}_{t+1}\in\mathcal{B}_{f}^{t+1}over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ∈ caligraphic_B start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT that is measurable with respect to the filtration tsubscript𝑡\mathcal{F}_{t}caligraphic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, we have, with probability at least 1δ1𝛿1-{\delta}1 - italic_δ, t1for-all𝑡1\forall t\geq 1∀ italic_t ≥ 1,

τ=1tsuperscriptsubscript𝜏1𝑡\displaystyle\sum_{\tau=1}^{t}∑ start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ((f^t+1(xτ)f^t+1(xτ))(f(xτ)f(xτ)))2superscriptsubscript^𝑓𝑡1subscript𝑥𝜏subscript^𝑓𝑡1subscriptsuperscript𝑥𝜏𝑓subscript𝑥𝜏𝑓subscriptsuperscript𝑥𝜏2\displaystyle\left(\left(\hat{f}_{t+1}(x_{\tau})-\hat{f}_{t+1}(x^{\prime}_{% \tau})\right)-\left({f}(x_{\tau})-{f}(x^{\prime}_{\tau})\right)\right)^{2}( ( over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) - over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) ) - ( italic_f ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) - italic_f ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
β(ϵ,δ/2,t),absent𝛽italic-ϵ𝛿2𝑡\displaystyle\leq\beta(\epsilon,\nicefrac{{\delta}}{{2}},t),≤ italic_β ( italic_ϵ , / start_ARG italic_δ end_ARG start_ARG 2 end_ARG , italic_t ) , (14)

and

fft+1,𝑓superscriptsubscript𝑓𝑡1f\in\mathcal{B}_{f}^{t+1},italic_f ∈ caligraphic_B start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT , (15)

where β(ϵ,δ/2,t)=σ¯2Hσ(β2(ϵ,δ/2,t)+2β1(ϵ,δ/2,t))=𝒪(tlogt𝒩(f,ϵ,)δ+ϵt+ϵ2t)\beta(\epsilon,\nicefrac{{\delta}}{{2}},t)=\frac{\underline{\sigma^{\prime}}^{% 2}}{H_{\sigma}}\left(\beta_{2}(\epsilon,\nicefrac{{\delta}}{{2}},t)+2\beta_{1}% (\epsilon,\nicefrac{{\delta}}{{2}},t)\right)=\mathcal{O}\left(\sqrt{t\log\frac% {t\mathcal{N}(\mathcal{B}_{f},\epsilon,\|\cdot\|_{\infty})}{\delta}}+\epsilon t% +\epsilon^{2}t\right)italic_β ( italic_ϵ , / start_ARG italic_δ end_ARG start_ARG 2 end_ARG , italic_t ) = divide start_ARG under¯ start_ARG italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_H start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT end_ARG ( italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_ϵ , / start_ARG italic_δ end_ARG start_ARG 2 end_ARG , italic_t ) + 2 italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_ϵ , / start_ARG italic_δ end_ARG start_ARG 2 end_ARG , italic_t ) ) = caligraphic_O ( square-root start_ARG italic_t roman_log divide start_ARG italic_t caligraphic_N ( caligraphic_B start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT , italic_ϵ , ∥ ⋅ ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ) end_ARG start_ARG italic_δ end_ARG end_ARG + italic_ϵ italic_t + italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_t ), with β2(ϵ,δ,t)=8Hσσ¯2ϵ2t+2CLϵt+8tBp2logπ2t2𝒩(f,ϵ,)3δ\beta_{2}(\epsilon,\delta,t)=8H_{\sigma}\bar{\sigma^{\prime}}^{2}\epsilon^{2}t% +2C_{L}\epsilon t+\sqrt{8{tB_{p}^{2}\log\frac{\pi^{2}t^{2}\mathcal{N}(\mathcal% {B}_{f},\epsilon,\|\cdot\|_{\infty})}{3\delta}}{}}italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_ϵ , italic_δ , italic_t ) = 8 italic_H start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT over¯ start_ARG italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_t + 2 italic_C start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT italic_ϵ italic_t + square-root start_ARG 8 italic_t italic_B start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log divide start_ARG italic_π start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT caligraphic_N ( caligraphic_B start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT , italic_ϵ , ∥ ⋅ ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ) end_ARG start_ARG 3 italic_δ end_ARG end_ARG and the constants σ¯,Hσ,σ¯,Bp¯superscript𝜎subscript𝐻𝜎¯superscript𝜎subscript𝐵𝑝\underline{\sigma^{\prime}},H_{\sigma},\bar{\sigma^{\prime}},B_{p}under¯ start_ARG italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG , italic_H start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT , over¯ start_ARG italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG , italic_B start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT as defined in Appendix B.

Lem. 3.5 highlights that with high probability, all the functions in the confidence set have difference values over the historical sample points that lie in a ball with the ground-truth function difference value as the center and β(ϵ,δ/2,t)𝛽italic-ϵ𝛿2𝑡\sqrt{\beta(\epsilon,\nicefrac{{\delta}}{{2}},t)}square-root start_ARG italic_β ( italic_ϵ , / start_ARG italic_δ end_ARG start_ARG 2 end_ARG , italic_t ) end_ARG as the radius. Lem. 3.5 indicates that our likelihood-based learning scheme can gradually learn the function differences f(xτ)f(xτ)𝑓subscript𝑥𝜏𝑓superscriptsubscript𝑥𝜏f(x_{\tau})-f(x_{\tau}^{\prime})italic_f ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) - italic_f ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) but not the absolute value f(xτ)𝑓subscript𝑥𝜏f(x_{\tau})italic_f ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ). This is reasonable since shifting f𝑓fitalic_f by a constant will not change the distribution of preference feedback.

Furthermore, to derive an error bound over a new pair (x,x)𝑥superscript𝑥(x,x^{\prime})( italic_x , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ), we need to quantify the uncertainty of f~(x)f~(x)~𝑓𝑥~𝑓superscript𝑥\tilde{f}(x)-\tilde{f}(x^{\prime})over~ start_ARG italic_f end_ARG ( italic_x ) - over~ start_ARG italic_f end_ARG ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ), where f~f~𝑓subscript𝑓\tilde{f}\in\mathcal{B}_{f}over~ start_ARG italic_f end_ARG ∈ caligraphic_B start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT. Since f~f~𝑓subscript𝑓-\tilde{f}\in\mathcal{B}_{f}- over~ start_ARG italic_f end_ARG ∈ caligraphic_B start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT by the definition of fsubscript𝑓\mathcal{B}_{f}caligraphic_B start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT, it can be seen that f~(x)f~(x)ff~𝑓𝑥~𝑓superscript𝑥subscript𝑓superscript𝑓\tilde{f}(x)-\tilde{f}(x^{\prime})\in\mathcal{B}_{ff^{\prime}}over~ start_ARG italic_f end_ARG ( italic_x ) - over~ start_ARG italic_f end_ARG ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ∈ caligraphic_B start_POSTSUBSCRIPT italic_f italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT, where

ff:={F(x,x)=f~(x)+f~(x)|f~,f~f}.assignsubscript𝑓superscript𝑓conditional-set𝐹𝑥superscript𝑥~𝑓𝑥superscript~𝑓superscript𝑥~𝑓superscript~𝑓subscript𝑓\mathcal{B}_{ff^{\prime}}\vcentcolon=\{F(x,x^{\prime})=\tilde{f}(x)+\tilde{f}^% {\prime}(x^{\prime})|\tilde{f},\tilde{f}^{\prime}\in\mathcal{B}_{f}\}.caligraphic_B start_POSTSUBSCRIPT italic_f italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT := { italic_F ( italic_x , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = over~ start_ARG italic_f end_ARG ( italic_x ) + over~ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) | over~ start_ARG italic_f end_ARG , over~ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_B start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT } . (16)

Indeed, ffsubscript𝑓superscript𝑓\mathcal{B}_{ff^{\prime}}caligraphic_B start_POSTSUBSCRIPT italic_f italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT is the ball with radius 2B2𝐵2B2 italic_B in the RKHS equipped with the additive kernel function kff((x,x),(x¯,x¯)):=k(x,x¯)+k(x,x¯)assignsuperscript𝑘𝑓superscript𝑓𝑥superscript𝑥¯𝑥superscript¯𝑥𝑘𝑥¯𝑥𝑘superscript𝑥superscript¯𝑥k^{ff^{\prime}}((x,x^{\prime}),(\bar{x},\bar{x}^{\prime}))\vcentcolon=k(x,\bar% {x})+k(x^{\prime},\bar{x}^{\prime})italic_k start_POSTSUPERSCRIPT italic_f italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( ( italic_x , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) , ( over¯ start_ARG italic_x end_ARG , over¯ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) := italic_k ( italic_x , over¯ start_ARG italic_x end_ARG ) + italic_k ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , over¯ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ), which we term as the augmented RKHS here, and inner product f1+f1,f2+f2kff:=f1,f2k+f1,f2kassignsubscriptsubscript𝑓1superscriptsubscript𝑓1subscript𝑓2superscriptsubscript𝑓2superscript𝑘𝑓superscript𝑓subscriptsubscript𝑓1subscript𝑓2𝑘subscriptsuperscriptsubscript𝑓1superscriptsubscript𝑓2𝑘\langle f_{1}+f_{1}^{\prime},f_{2}+f_{2}^{\prime}\rangle_{k^{ff^{\prime}}}% \vcentcolon=\langle f_{1},f_{2}\rangle_{k}+\langle f_{1}^{\prime},f_{2}^{% \prime}\rangle_{k}⟨ italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⟩ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT italic_f italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT end_POSTSUBSCRIPT := ⟨ italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + ⟨ italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⟩ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. The readers are referred to (Christmann & Hable, 2012; Kandasamy et al., 2015) for more details of the additive kernel and the corresponding RKHS. To quantify the uncertainty of a new pair (x,x)𝑥superscript𝑥(x,x^{\prime})( italic_x , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ), we further introduce the function,

(σtff(ω))2superscriptsubscriptsuperscript𝜎𝑓superscript𝑓𝑡𝜔2\displaystyle\left(\sigma^{ff^{\prime}}_{t}(\omega)\right)^{2}( italic_σ start_POSTSUPERSCRIPT italic_f italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_ω ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT =kff(ω,ω)absentsuperscript𝑘𝑓superscript𝑓𝜔𝜔\displaystyle={k}^{ff^{\prime}}\left(\omega,{\omega}\right)= italic_k start_POSTSUPERSCRIPT italic_f italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_ω , italic_ω ) (17)
kff(\displaystyle-{k}^{ff^{\prime}}(- italic_k start_POSTSUPERSCRIPT italic_f italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( ω1:t1,ω)(Kt1ff+λI)1kff(ω1:t1,ω),\displaystyle\omega_{1:t-1},\omega)^{\top}\left({K}^{ff^{\prime}}_{t-1}+% \lambda I\right)^{-1}{k}^{ff^{\prime}}\left(\omega_{1:t-1},{\omega}\right),italic_ω start_POSTSUBSCRIPT 1 : italic_t - 1 end_POSTSUBSCRIPT , italic_ω ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_K start_POSTSUPERSCRIPT italic_f italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + italic_λ italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_k start_POSTSUPERSCRIPT italic_f italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_ω start_POSTSUBSCRIPT 1 : italic_t - 1 end_POSTSUBSCRIPT , italic_ω ) ,

where ω:=(x,x)assign𝜔𝑥superscript𝑥\omega\vcentcolon=(x,x^{\prime})italic_ω := ( italic_x , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ), ω1:t1:=((xτ,xτ))τ=1t1assignsubscript𝜔:1𝑡1superscriptsubscriptsubscript𝑥𝜏superscriptsubscript𝑥𝜏𝜏1𝑡1\omega_{1:t-1}\vcentcolon=((x_{\tau},x_{\tau}^{\prime}))_{\tau=1}^{t-1}italic_ω start_POSTSUBSCRIPT 1 : italic_t - 1 end_POSTSUBSCRIPT := ( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT, Kt1ff:=(kff((xτ1,xτ1),(xτ2,xτ2)))τ1[t1],τ2[t1]assignsuperscriptsubscript𝐾𝑡1𝑓superscript𝑓subscriptsuperscript𝑘𝑓superscript𝑓subscript𝑥subscript𝜏1superscriptsubscript𝑥subscript𝜏1subscript𝑥subscript𝜏2superscriptsubscript𝑥subscript𝜏2formulae-sequencesubscript𝜏1delimited-[]𝑡1subscript𝜏2delimited-[]𝑡1K_{t-1}^{ff^{\prime}}\vcentcolon=(k^{ff^{\prime}}((x_{\tau_{1}},x_{\tau_{1}}^{% \prime}),(x_{\tau_{2}},x_{\tau_{2}}^{\prime})))_{\tau_{1}\in[t-1],\tau_{2}\in[% t-1]}italic_K start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_f italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT := ( italic_k start_POSTSUPERSCRIPT italic_f italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) , ( italic_x start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) ) start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ [ italic_t - 1 ] , italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ [ italic_t - 1 ] end_POSTSUBSCRIPT, and λ𝜆\lambdaitalic_λ is a positive regularization constant.

Theorem 3.6 (Duel-wise Error Bound).

For any estimate f^t+1ft+1subscript^𝑓𝑡1superscriptsubscript𝑓𝑡1\hat{f}_{t+1}\in\mathcal{B}_{f}^{t+1}over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ∈ caligraphic_B start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT measurable with respect to tsubscript𝑡\mathcal{F}_{t}caligraphic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, we have, with probability at least 1δ1𝛿1-{\delta}1 - italic_δ, t1,(x,x)𝒳×𝒳formulae-sequencefor-all𝑡1𝑥superscript𝑥𝒳𝒳\forall t\geq 1,(x,x^{\prime})\in\mathcal{X}\times\mathcal{X}∀ italic_t ≥ 1 , ( italic_x , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ∈ caligraphic_X × caligraphic_X,

|(f^t+1(x)f^t+1(x))(f(x)f(x))|subscript^𝑓𝑡1𝑥subscript^𝑓𝑡1superscript𝑥𝑓𝑥𝑓superscript𝑥\displaystyle\big{|}{(\hat{f}_{t+1}(x)-\hat{f}_{t+1}(x^{\prime}))-(f(x)-f(x^{% \prime}))}\big{|}| ( over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( italic_x ) - over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) - ( italic_f ( italic_x ) - italic_f ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) |
\displaystyle\leq   2(2B+λ1/2β(ϵ,δ/2,t))σt+1ff((x,x)).22𝐵superscript𝜆12𝛽italic-ϵ𝛿2𝑡subscriptsuperscript𝜎𝑓superscript𝑓𝑡1𝑥superscript𝑥\displaystyle\;\;2\left(2B+\lambda^{-\nicefrac{{1}}{{2}}}\sqrt{\beta(\epsilon,% \nicefrac{{\delta}}{{2}},t)}\right)\sigma^{ff^{\prime}}_{t+1}((x,x^{\prime})).2 ( 2 italic_B + italic_λ start_POSTSUPERSCRIPT - / start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT square-root start_ARG italic_β ( italic_ϵ , / start_ARG italic_δ end_ARG start_ARG 2 end_ARG , italic_t ) end_ARG ) italic_σ start_POSTSUPERSCRIPT italic_f italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( ( italic_x , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) . (18)
Remark 3.7.

In preferential BO, we do not get the scalar value of f(x)f(x)𝑓𝑥𝑓superscript𝑥f(x)-f(x^{\prime})italic_f ( italic_x ) - italic_f ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ). Hence, σtffsuperscriptsubscript𝜎𝑡𝑓superscript𝑓\sigma_{t}^{ff^{\prime}}italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_f italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT can not be interpreted as the posterior standard deviation as in (Srinivas et al., 2012). However, it turns out that σtffsuperscriptsubscript𝜎𝑡𝑓superscript𝑓\sigma_{t}^{ff^{\prime}}italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_f italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT, as a measure of uncertainty, still accounts for a factor of the duel-wise error.

To characterize the complexity of this augmented RKHS, we use the maximum information gain (Srinivas et al., 2012),

γTff:=maxΩ𝒳×𝒳;|Ω|=T12log|I+λ1KΩff|,assignsubscriptsuperscript𝛾𝑓superscript𝑓𝑇subscriptformulae-sequenceΩ𝒳𝒳Ω𝑇12𝐼superscript𝜆1subscriptsuperscript𝐾𝑓superscript𝑓Ω\gamma^{ff^{\prime}}_{T}\vcentcolon=\max_{\Omega\subset{\mathcal{X}\times% \mathcal{X}};|\Omega|=T}\frac{1}{2}\log\left|I+\lambda^{-1}{K}^{ff^{\prime}}_{% \Omega}\right|,italic_γ start_POSTSUPERSCRIPT italic_f italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT := roman_max start_POSTSUBSCRIPT roman_Ω ⊂ caligraphic_X × caligraphic_X ; | roman_Ω | = italic_T end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_log | italic_I + italic_λ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_K start_POSTSUPERSCRIPT italic_f italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_Ω end_POSTSUBSCRIPT | , (19)

where KΩff=(kff((x,x),(x¯,x¯)))(x,x),(x¯,x¯)Ωsubscriptsuperscript𝐾𝑓superscript𝑓Ωsubscriptsuperscript𝑘𝑓superscript𝑓𝑥superscript𝑥¯𝑥superscript¯𝑥𝑥superscript𝑥¯𝑥superscript¯𝑥Ω{K}^{ff^{\prime}}_{\Omega}=\left(k^{ff^{\prime}}((x,x^{\prime}),(\bar{x},\bar{% x}^{\prime}))\right)_{(x,x^{\prime}),(\bar{x},\bar{x}^{\prime})\in\Omega}italic_K start_POSTSUPERSCRIPT italic_f italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_Ω end_POSTSUBSCRIPT = ( italic_k start_POSTSUPERSCRIPT italic_f italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( ( italic_x , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) , ( over¯ start_ARG italic_x end_ARG , over¯ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) ) start_POSTSUBSCRIPT ( italic_x , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) , ( over¯ start_ARG italic_x end_ARG , over¯ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ∈ roman_Ω end_POSTSUBSCRIPT.

4 Algorithm

4.1 Principled Optimistic Algorithm

We are now ready to give the optimistic algorithm in Alg. 1.

Algorithm 1 Principled Optimistic Preferential Bayesian Optimization (POP-BO).
1:  Given the initial point x0𝒳subscript𝑥0𝒳x_{0}\in\mathcal{X}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ caligraphic_X and set f1=fsubscriptsuperscript1𝑓subscript𝑓\mathcal{B}^{1}_{f}=\mathcal{B}_{f}caligraphic_B start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT = caligraphic_B start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT.
2:  for t[T]𝑡delimited-[]𝑇t\in[T]italic_t ∈ [ italic_T ] do
3:     Set the reference point xt=xt1superscriptsubscript𝑥𝑡subscript𝑥𝑡1x_{t}^{\prime}=x_{t-1}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_x start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT
4:     Compute
xtargmaxx𝒳maxf~ft(f~(x)f~(xt)),subscript𝑥𝑡subscript𝑥𝒳subscript~𝑓superscriptsubscript𝑓𝑡~𝑓𝑥~𝑓superscriptsubscript𝑥𝑡x_{t}\in\arg\max_{x\in\mathcal{X}}\max_{\tilde{f}\in\mathcal{B}_{f}^{t}}(% \tilde{f}(x)-\tilde{f}(x_{t}^{\prime})),italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ roman_arg roman_max start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT roman_max start_POSTSUBSCRIPT over~ start_ARG italic_f end_ARG ∈ caligraphic_B start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( over~ start_ARG italic_f end_ARG ( italic_x ) - over~ start_ARG italic_f end_ARG ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) ,
with the inner optimal function denoted as f~tsubscript~𝑓𝑡\tilde{f}_{t}over~ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT.
5:     Query the comparison oracle to get the feedback result 𝟏tsubscript1𝑡\mathbf{1}_{t}bold_1 start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and append the new data to 𝒟tsubscript𝒟𝑡\mathcal{D}_{t}caligraphic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT.
6:     Update the maximum likelihood estimator f^tMLEsubscriptsuperscript^𝑓MLE𝑡\hat{f}^{\mathrm{MLE}}_{t}over^ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT roman_MLE end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and the posterior confidence set ft+1subscriptsuperscript𝑡1𝑓\mathcal{B}^{t+1}_{f}caligraphic_B start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT.
7:  end for

The key to Alg. 1 is line 4. The idea is to maximize the optimistic advantage of f~(x)~𝑓𝑥\tilde{f}(x)over~ start_ARG italic_f end_ARG ( italic_x ) as compared to f~(xt)~𝑓superscriptsubscript𝑥𝑡\tilde{f}(x_{t}^{\prime})over~ start_ARG italic_f end_ARG ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) with the uncertainty of the black-box function f~ft~𝑓superscriptsubscript𝑓𝑡\tilde{f}\in\mathcal{B}_{f}^{t}over~ start_ARG italic_f end_ARG ∈ caligraphic_B start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT.

In line 3, we set the reference point xtsuperscriptsubscript𝑥𝑡x_{t}^{\prime}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT as the last generated point xt1subscript𝑥𝑡1x_{t-1}italic_x start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT. In practice, this may correspond to two possible scenarios. In the first, each comparison requires one experiment, such as image quality comparison. In this case, we only need to set one of the compared pair as the last newly generated solution. While in the other scenario, comparing xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and xtsuperscriptsubscript𝑥𝑡x_{t}^{\prime}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT needs separate experiments for xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and xtsuperscriptsubscript𝑥𝑡x_{t}^{\prime}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. For example, when optimizing the building thermal comfort, the occupants need to experience both thermal conditions to report preference. If at step t𝑡titalic_t, the oracle still has memory about the experience with input xt1subscript𝑥𝑡1x_{t-1}italic_x start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT, we can directly compare xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and xt1subscript𝑥𝑡1x_{t-1}italic_x start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT. In this case, setting xtsuperscriptsubscript𝑥𝑡x_{t}^{\prime}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT to be xt1subscript𝑥𝑡1x_{t-1}italic_x start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT saves the experimental expense with xtsuperscriptsubscript𝑥𝑡x_{t}^{\prime}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT.

For online applications, cumulative regret is more of our interest. However, for an offline optimization setting, it may be of more interest to identify one near-optimal solution to report. Unlike in the scalar evaluation setting, where we can directly use the scalar value to report the best observed solution, we can not directly identify the best sampled solution in the preferential Bayesian optimization scenario. To address this issue, we report the solution xtsubscript𝑥superscript𝑡x_{t^{\star}}italic_x start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT, where

targmint[T]2(2B+λ1/2β(ϵ,δ/2,t))σtff((xt,xt)).superscript𝑡subscript𝑡delimited-[]𝑇22𝐵superscript𝜆12𝛽italic-ϵ𝛿2𝑡superscriptsubscript𝜎𝑡𝑓superscript𝑓subscript𝑥𝑡superscriptsubscript𝑥𝑡t^{\star}\in\arg\min_{t\in[T]}2\left(2B+\lambda^{-\nicefrac{{1}}{{2}}}\sqrt{% \beta(\epsilon,\nicefrac{{\delta}}{{2}},t)}\right)\sigma_{t}^{ff^{\prime}}((x_% {t},x_{t}^{\prime})).italic_t start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∈ roman_arg roman_min start_POSTSUBSCRIPT italic_t ∈ [ italic_T ] end_POSTSUBSCRIPT 2 ( 2 italic_B + italic_λ start_POSTSUPERSCRIPT - / start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT square-root start_ARG italic_β ( italic_ϵ , / start_ARG italic_δ end_ARG start_ARG 2 end_ARG , italic_t ) end_ARG ) italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_f italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) . (20)

The idea is that although the best sample may not be known, we can derive a solution by minimizing the known term 2(2B+λ1/2β(ϵ,δ/2,t))σtff((xt,xt))22𝐵superscript𝜆12𝛽italic-ϵ𝛿2𝑡superscriptsubscript𝜎𝑡𝑓superscript𝑓subscript𝑥𝑡superscriptsubscript𝑥𝑡2(2B+\lambda^{-\nicefrac{{1}}{{2}}}\sqrt{\beta(\epsilon,\nicefrac{{\delta}}{{2% }},t)})\sigma_{t}^{ff^{\prime}}((x_{t},x_{t}^{\prime}))2 ( 2 italic_B + italic_λ start_POSTSUPERSCRIPT - / start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT square-root start_ARG italic_β ( italic_ϵ , / start_ARG italic_δ end_ARG start_ARG 2 end_ARG , italic_t ) end_ARG ) italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_f italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) to find a solution xtsubscript𝑥superscript𝑡x_{t^{\star}}italic_x start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT to report. Indeed, this term upper bounds the uncertainty of the optimistic advantage (as shown in Thm. 3.6). Hence, the smaller it is, the more certain that f(xt)𝑓subscript𝑥𝑡f(x_{t})italic_f ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) is close to the ground-truth optimal value. At step t𝑡titalic_t, we can report the current estimated solution with index τ(t)superscript𝜏𝑡\tau^{\star}(t)italic_τ start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ( italic_t ) satisfying a similar formula to Eq. (20).

4.2 Efficient Computations

Line 4 in Alg. 1 requires solving a nested optimization problem with inner variables in an infinite-dimensional function space. The update of the maximum likelihood estimator also requires solving an optimization problem with an infinite-dimensional function as the decision variable. These are in general not tractable in their current forms. Fortunately, we can reduce the infinite-dimensional problems to finite-dimensional ones, thanks to the structures of the problem and the representer theorem (Schölkopf et al., 2001).

Maximum likelihood estimation. Since the log-likelihood function

t(f~)=subscript𝑡~𝑓absent\displaystyle\ell_{t}(\tilde{f})=roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over~ start_ARG italic_f end_ARG ) = logf~((xτ,xτ,𝟏τ)τ=1t)subscript~𝑓superscriptsubscriptsubscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏𝜏1𝑡\displaystyle\log\mathbb{P}_{\tilde{f}}((x_{\tau},x_{\tau}^{\prime},\mathbf{1}% _{\tau})_{\tau=1}^{t})roman_log blackboard_P start_POSTSUBSCRIPT over~ start_ARG italic_f end_ARG end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) (21)
=\displaystyle== τ=1t(zτ𝟏τ+zτ(1𝟏τ))τ=1tlog(ezτ+ezτ)superscriptsubscript𝜏1𝑡subscript𝑧𝜏subscript1𝜏superscriptsubscript𝑧𝜏1subscript1𝜏superscriptsubscript𝜏1𝑡superscript𝑒subscript𝑧𝜏superscript𝑒superscriptsubscript𝑧𝜏\displaystyle\sum_{\tau=1}^{t}\left(z_{\tau}\mathbf{1}_{\tau}+z_{\tau}^{\prime% }(1-\mathbf{1}_{\tau})\right)-\sum_{\tau=1}^{t}\log\left(e^{z_{\tau}}+e^{z_{% \tau}^{\prime}}\right)∑ start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT + italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( 1 - bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) ) - ∑ start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT roman_log ( italic_e start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT + italic_e start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT )

only depends on the function value (zτ,zτ)=(f~(xτ),f~(xτ))subscript𝑧𝜏superscriptsubscript𝑧𝜏~𝑓subscript𝑥𝜏~𝑓superscriptsubscript𝑥𝜏(z_{\tau},z_{\tau}^{\prime})=(\tilde{f}(x_{\tau}),\tilde{f}(x_{\tau}^{\prime}))( italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = ( over~ start_ARG italic_f end_ARG ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) , over~ start_ARG italic_f end_ARG ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ), we only need to optimize over (zτ,zτ)subscript𝑧𝜏superscriptsubscript𝑧𝜏(z_{\tau},z_{\tau}^{\prime})( italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) subject to that they are functions in ksubscript𝑘\mathcal{H}_{k}caligraphic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT with norm less or equal to B𝐵Bitalic_B. Furthermore, Alg. 1 sets xτ=xτ1superscriptsubscript𝑥𝜏subscript𝑥𝜏1x_{\tau}^{\prime}=x_{\tau-1}italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_x start_POSTSUBSCRIPT italic_τ - 1 end_POSTSUBSCRIPT and thus zτ=zτ1superscriptsubscript𝑧𝜏subscript𝑧𝜏1z_{\tau}^{\prime}=z_{\tau-1}italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_z start_POSTSUBSCRIPT italic_τ - 1 end_POSTSUBSCRIPT. So we can reduce the optimization variables to only (zτ)τ=0tsuperscriptsubscriptsubscript𝑧𝜏𝜏0𝑡(z_{\tau})_{\tau=0}^{t}( italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT. Hence, Eq. (21) is reduced to the following log-likelihood function that only depends on (zτ)τ=0tsuperscriptsubscriptsubscript𝑧𝜏𝜏0𝑡(z_{\tau})_{\tau=0}^{t}( italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT,

(Z0:t|𝒟t)conditionalsubscript𝑍:0𝑡subscript𝒟𝑡\displaystyle\ell(Z_{0:t}|\mathcal{D}_{t})roman_ℓ ( italic_Z start_POSTSUBSCRIPT 0 : italic_t end_POSTSUBSCRIPT | caligraphic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) (22)
:=assign\displaystyle\vcentcolon=:= Z1:t𝟏1:t+Z0:t1(1𝟏1:t)τ=1tlog(ezτ+ezτ1),superscriptsubscript𝑍:1𝑡topsubscript1:1𝑡superscriptsubscript𝑍:0𝑡1top1subscript1:1𝑡superscriptsubscript𝜏1𝑡superscript𝑒subscript𝑧𝜏superscript𝑒subscript𝑧𝜏1\displaystyle\;Z_{1:t}^{\top}\mathbf{1}_{1:t}+{Z_{0:t-1}}^{\top}(1-\mathbf{1}_% {1:t})-\sum_{\tau=1}^{t}\log\left(e^{z_{\tau}}+e^{z_{\tau-1}}\right),italic_Z start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_1 start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT + italic_Z start_POSTSUBSCRIPT 0 : italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( 1 - bold_1 start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT ) - ∑ start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT roman_log ( italic_e start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT + italic_e start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_τ - 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) ,

where Z0:t:=(zτ)τ=0tassignsubscript𝑍:0𝑡superscriptsubscriptsubscript𝑧𝜏𝜏0𝑡Z_{0:t}\vcentcolon=(z_{\tau})_{\tau=0}^{t}italic_Z start_POSTSUBSCRIPT 0 : italic_t end_POSTSUBSCRIPT := ( italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT, Z1:t:=(zτ)τ=1tassignsubscript𝑍:1𝑡superscriptsubscriptsubscript𝑧𝜏𝜏1𝑡Z_{1:t}\vcentcolon=(z_{\tau})_{\tau=1}^{t}italic_Z start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT := ( italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT, Z0:t1:=(zτ)τ=0t1assignsubscript𝑍:0𝑡1superscriptsubscriptsubscript𝑧𝜏𝜏0𝑡1Z_{0:t-1}\vcentcolon=(z_{\tau})_{\tau=0}^{t-1}italic_Z start_POSTSUBSCRIPT 0 : italic_t - 1 end_POSTSUBSCRIPT := ( italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT and 𝟏1:t=(𝟏τ)τ=1tsubscript1:1𝑡superscriptsubscriptsubscript1𝜏𝜏1𝑡\mathbf{1}_{1:t}=(\mathbf{1}_{\tau})_{\tau=1}^{t}bold_1 start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT = ( bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT.

By the representer theorem (Schölkopf et al., 2001), the maximum likelihood estimation problem can be solved via,

t(f^tMLE)=maxZ0:tt+1subscript𝑡superscriptsubscript^𝑓𝑡MLEsubscriptsubscript𝑍:0𝑡superscript𝑡1\displaystyle\ell_{t}(\hat{f}_{t}^{\mathrm{MLE}})=\max_{Z_{0:t}\in\mathbb{R}^{% t+1}}roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_MLE end_POSTSUPERSCRIPT ) = roman_max start_POSTSUBSCRIPT italic_Z start_POSTSUBSCRIPT 0 : italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT (Z0:t|𝒟t)conditionalsubscript𝑍:0𝑡subscript𝒟𝑡\displaystyle\quad\ell(Z_{0:t}|\mathcal{D}_{t})roman_ℓ ( italic_Z start_POSTSUBSCRIPT 0 : italic_t end_POSTSUBSCRIPT | caligraphic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) (23)
subject to Z0:tK0:t1Z0:tB2,superscriptsubscript𝑍:0𝑡topsuperscriptsubscript𝐾:0𝑡1subscript𝑍:0𝑡superscript𝐵2\displaystyle\quad Z_{0:t}^{\top}K_{0:t}^{-1}Z_{0:t}\leq B^{2},italic_Z start_POSTSUBSCRIPT 0 : italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_K start_POSTSUBSCRIPT 0 : italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_Z start_POSTSUBSCRIPT 0 : italic_t end_POSTSUBSCRIPT ≤ italic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,

where K0:t:=(k(xτ1,xτ2))τ1{0}[t],τ2{0}[t]assignsubscript𝐾:0𝑡subscript𝑘subscript𝑥subscript𝜏1subscript𝑥subscript𝜏2formulae-sequencesubscript𝜏10delimited-[]𝑡subscript𝜏20delimited-[]𝑡K_{0:t}\vcentcolon=(k(x_{\tau_{1}},x_{\tau_{2}}))_{\tau_{1}\in\{0\}\cup[t],% \tau_{2}\in\{0\}\cup[t]}italic_K start_POSTSUBSCRIPT 0 : italic_t end_POSTSUBSCRIPT := ( italic_k ( italic_x start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ) start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ { 0 } ∪ [ italic_t ] , italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ { 0 } ∪ [ italic_t ] end_POSTSUBSCRIPT. The constraint restricts that the function values need to come from a function inside the function space ball fsubscript𝑓\mathcal{B}_{f}caligraphic_B start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT, where the left-hand side is indeed the minimum norm square of the possible interpolant through {(xτ,zτ)}τ=0tsuperscriptsubscriptsubscript𝑥𝜏subscript𝑧𝜏𝜏0𝑡\{(x_{\tau},z_{\tau})\}_{\tau=0}^{t}{ ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_τ = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT as shown in (Wendland, 2004). It can be checked that the maximization problem in Eq. (23) has a concave objective (as shown in Appendix A) with a convex feasible set. Thus, the problem in Eq. (23) is tractable via convex optimization.

Generating new sample point. On the line 4 of Alg. 1, a bi-level optimization problem needs to be solved, where the inner-level part has an infinite-dimensional function variable. The inner optimization problem has the form,

maxf~subscript~𝑓\displaystyle\max_{\tilde{f}}roman_max start_POSTSUBSCRIPT over~ start_ARG italic_f end_ARG end_POSTSUBSCRIPT f~(x)f~(xt)~𝑓𝑥~𝑓subscript𝑥𝑡\displaystyle\quad\tilde{f}(x)-\tilde{f}(x_{t})over~ start_ARG italic_f end_ARG ( italic_x ) - over~ start_ARG italic_f end_ARG ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) (24)
subject to f~f,~𝑓subscript𝑓\displaystyle\quad\tilde{f}\in\mathcal{B}_{f},over~ start_ARG italic_f end_ARG ∈ caligraphic_B start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ,
t(f~)t(f^tMLE)β1(ϵ,δ,t),subscript𝑡~𝑓subscript𝑡subscriptsuperscript^𝑓MLE𝑡subscript𝛽1italic-ϵ𝛿𝑡\displaystyle\quad\ell_{t}(\tilde{f})\geq\ell_{t}(\hat{f}^{\mathrm{MLE}}_{t})-% \beta_{1}(\epsilon,\delta,t),roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over~ start_ARG italic_f end_ARG ) ≥ roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT roman_MLE end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_ϵ , italic_δ , italic_t ) ,

where β1(ϵ,δ,t)subscript𝛽1italic-ϵ𝛿𝑡\beta_{1}(\epsilon,\delta,t)italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_ϵ , italic_δ , italic_t ) is as given in Thm. 3.1. Similar to the representer theorem, we have,

Lemma 4.1.

Prob. (24) can be equivalently reduced to,

maxZ0:tt+1,zsubscriptformulae-sequencesubscript𝑍:0𝑡superscript𝑡1𝑧\displaystyle\max_{Z_{0:t}\in\mathbb{R}^{t+1},z\in\mathbb{R}}roman_max start_POSTSUBSCRIPT italic_Z start_POSTSUBSCRIPT 0 : italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT , italic_z ∈ blackboard_R end_POSTSUBSCRIPT zzt𝑧subscript𝑧𝑡\displaystyle\quad z-z_{t}italic_z - italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT (25)
subject to [Z0:tz]K0:t,x1[Z0:tz]B2,superscriptdelimited-[]subscript𝑍:0𝑡𝑧topsuperscriptsubscript𝐾:0𝑡𝑥1delimited-[]subscript𝑍:0𝑡𝑧superscript𝐵2\displaystyle\quad\left[\begin{array}[]{l}Z_{0:t}\\ z\end{array}\right]^{\top}K_{0:t,x}^{-1}\left[\begin{array}[]{l}Z_{0:t}\\ z\end{array}\right]\leq B^{2},[ start_ARRAY start_ROW start_CELL italic_Z start_POSTSUBSCRIPT 0 : italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_z end_CELL end_ROW end_ARRAY ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_K start_POSTSUBSCRIPT 0 : italic_t , italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT [ start_ARRAY start_ROW start_CELL italic_Z start_POSTSUBSCRIPT 0 : italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_z end_CELL end_ROW end_ARRAY ] ≤ italic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,
(Z0:t|𝒟t)t(f^tMLE)β1(ϵ,δ,t),conditionalsubscript𝑍:0𝑡subscript𝒟𝑡subscript𝑡subscriptsuperscript^𝑓MLE𝑡subscript𝛽1italic-ϵ𝛿𝑡\displaystyle\quad\ell(Z_{0:t}|\mathcal{D}_{t})\geq\ell_{t}(\hat{f}^{\mathrm{% MLE}}_{t})-\beta_{1}(\epsilon,\delta,t),roman_ℓ ( italic_Z start_POSTSUBSCRIPT 0 : italic_t end_POSTSUBSCRIPT | caligraphic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ≥ roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT roman_MLE end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_ϵ , italic_δ , italic_t ) ,

where

K0:t,x=[K0:t(k(xτ,x))τ=0t(k(xτ,x))τ=0tk(x,x)].subscript𝐾:0𝑡𝑥delimited-[]subscript𝐾:0𝑡superscriptsubscript𝑘subscript𝑥𝜏𝑥𝜏0𝑡superscriptsuperscriptsubscript𝑘subscript𝑥𝜏𝑥𝜏0𝑡top𝑘𝑥𝑥\displaystyle K_{0:t,x}=\left[\begin{array}[]{cc}K_{0:t}&(k(x_{\tau},x))_{\tau% =0}^{t}\\ {(k(x_{\tau},x))_{\tau=0}^{t}}^{\top}&k(x,x)\end{array}\right].italic_K start_POSTSUBSCRIPT 0 : italic_t , italic_x end_POSTSUBSCRIPT = [ start_ARRAY start_ROW start_CELL italic_K start_POSTSUBSCRIPT 0 : italic_t end_POSTSUBSCRIPT end_CELL start_CELL ( italic_k ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x ) ) start_POSTSUBSCRIPT italic_τ = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL ( italic_k ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x ) ) start_POSTSUBSCRIPT italic_τ = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_CELL start_CELL italic_k ( italic_x , italic_x ) end_CELL end_ROW end_ARRAY ] . (26)

Similarly, it can be checked that the Prob. (25) is convex.

For low-dimensional x𝑥xitalic_x, the outer-level problem can be solved via grid search. For medium-dimensional problems, we can optimize the inner/outer variables using a gradient-based/zero-order optimization method. Alternatively, we can jointly optimize x,Z0:t𝑥subscript𝑍:0𝑡x,Z_{0:t}italic_x , italic_Z start_POSTSUBSCRIPT 0 : italic_t end_POSTSUBSCRIPT, and z𝑧zitalic_z by a nonlinear programming solver from multiple random initial conditions. That is, we add x𝑥xitalic_x as another optimization variable as shown in the Prob. (27),

maxZ0:tt+1,z,x𝒳subscriptformulae-sequencesubscript𝑍:0𝑡superscript𝑡1formulae-sequence𝑧𝑥𝒳\displaystyle\max_{Z_{0:t}\in\mathbb{R}^{t+1},z\in\mathbb{R},x\in\mathcal{X}}roman_max start_POSTSUBSCRIPT italic_Z start_POSTSUBSCRIPT 0 : italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT , italic_z ∈ blackboard_R , italic_x ∈ caligraphic_X end_POSTSUBSCRIPT zzt𝑧subscript𝑧𝑡\displaystyle\quad z-z_{t}italic_z - italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT (27)
subject to Constraints of Prob. (25).Constraints of Prob. (25)\displaystyle\quad\text{Constraints of Prob.~{}(\ref{eqn:reform_inner_prob_to_% fin})}.Constraints of Prob. ( ) .

More details on this joint optimization approach is in Appendix H.

Remark 4.2.

We add a matrix ϵKIsubscriptitalic-ϵ𝐾𝐼\epsilon_{K}Iitalic_ϵ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT italic_I to K0:tsubscript𝐾:0𝑡K_{0:t}italic_K start_POSTSUBSCRIPT 0 : italic_t end_POSTSUBSCRIPT and K0:t,xsubscript𝐾:0𝑡𝑥K_{0:t,x}italic_K start_POSTSUBSCRIPT 0 : italic_t , italic_x end_POSTSUBSCRIPT before inversion to avoid numerical issue, where ϵK>0subscriptitalic-ϵ𝐾0\epsilon_{K}>0italic_ϵ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT > 0 is small.

Remark 4.3.

In this paper, we mainly consider the setting where in each step, the preference is queried over two candidate points. Our Alg. 1 and the efficient computation schemes in this section can be easily extended to multiple-choice setting, where in each step, the best or most preferred point is queried over a batch of candidates. The detailed discussion is in Appendix I.

5 Theoretical Analysis

We first introduce the performance metrics to use. As in the standard Bayesian optimization setting ((Srinivas et al., 2012)), cumulative regret is used as defined in Eq. (28),

RT:=t=1T(f(x)f(xt)),assignsubscript𝑅𝑇superscriptsubscript𝑡1𝑇𝑓superscript𝑥𝑓subscript𝑥𝑡R_{T}\vcentcolon=\sum_{t=1}^{T}\left(f(x^{\star})-f(x_{t})\right),italic_R start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT := ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_f ( italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) - italic_f ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) , (28)

where xargmaxx𝒳f(x)superscript𝑥subscript𝑥𝒳𝑓𝑥x^{\star}\in\arg\max_{x\in\mathcal{X}}f(x)italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∈ roman_arg roman_max start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT italic_f ( italic_x ).

Remark 5.1.

The cumulative regret RTsubscript𝑅𝑇R_{T}italic_R start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT as defined in Eq. (28) does not explicitly consider the sub-optimality of the reference point xtsuperscriptsubscript𝑥𝑡x_{t}^{\prime}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. However, since xt=xt1superscriptsubscript𝑥𝑡subscript𝑥𝑡1x_{t}^{\prime}=x_{t-1}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_x start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT, the cumulative regret of the reference points is the same as RTsubscript𝑅𝑇R_{T}italic_R start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT in Eq. (28), up to the difference of the first/last term.

Cumulative regret is of interest in the online setting. In the offline optimization setting, it is of more interest to analyze the sub-optimality of the final reported solution, i.e.,

f(x)f(xt),𝑓superscript𝑥𝑓subscript𝑥superscript𝑡f(x^{\star})-f(x_{t^{\star}}),italic_f ( italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) - italic_f ( italic_x start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) , (29)

where xtsubscript𝑥superscript𝑡{x}_{t^{\star}}italic_x start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT is the final reported solution as defined in Eq. (20).

5.1 Regret Bound and Convergence Rate

Theorem 5.2 (Cumulative Regret Bound).

With probability at least 1δ1𝛿1-\delta1 - italic_δ, the cumulative regret of Alg. 1 satisfies,

RT=𝒪(βTγTffT),subscript𝑅𝑇𝒪subscript𝛽𝑇subscriptsuperscript𝛾𝑓superscript𝑓𝑇𝑇R_{T}=\mathcal{O}\left(\sqrt{\beta_{T}\gamma^{ff^{\prime}}_{T}T}\right),italic_R start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = caligraphic_O ( square-root start_ARG italic_β start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT italic_γ start_POSTSUPERSCRIPT italic_f italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT italic_T end_ARG ) , (30)

where

βT=β(1/T,δ,T)=𝒪(TlogT𝒩(f,1/T,)δ).\beta_{T}=\beta(\nicefrac{{1}}{{T}},\delta,T)=\mathcal{O}\left(\sqrt{T\log% \frac{T\mathcal{N}(\mathcal{B}_{f},\nicefrac{{1}}{{T}},\|\cdot\|_{\infty})}{% \delta}}\right).italic_β start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = italic_β ( / start_ARG 1 end_ARG start_ARG italic_T end_ARG , italic_δ , italic_T ) = caligraphic_O ( square-root start_ARG italic_T roman_log divide start_ARG italic_T caligraphic_N ( caligraphic_B start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT , / start_ARG 1 end_ARG start_ARG italic_T end_ARG , ∥ ⋅ ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ) end_ARG start_ARG italic_δ end_ARG end_ARG ) .
Remark 5.3 (Differentiate from GP-UCB regret).

Our bound has a similar form as compared to the well-known regret bound for standard GP-UCB type algorithms (Srinivas et al., 2012; Chowdhury & Gopalan, 2017a). However, the βTsubscript𝛽𝑇\beta_{T}italic_β start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT term here is significantly different from that in the existing literature (e.g., in Thm. 3 in (Srinivas et al., 2012)). It is derived specifically for the preferential BO and will lead to a bit larger bound for specific kernels in Sec. 5.2.

We highlight that Thm. 5.2 provides the first-of-its-kind information-theoretic bound on the cumulative regret of preferential BO, which further allows us to derive a convergence rate for the reported solution xtsubscript𝑥superscript𝑡x_{t^{\star}}italic_x start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT in Thm. 5.4.

Theorem 5.4 (Convergence Guarantee).

Let tsuperscript𝑡t^{\star}italic_t start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT be defined as in Eq. (20). With probability at least 1δ1𝛿1-\delta1 - italic_δ,

f(x)f(xt)𝒪(βTγTffT).𝑓superscript𝑥𝑓subscript𝑥superscript𝑡𝒪subscript𝛽𝑇subscriptsuperscript𝛾𝑓superscript𝑓𝑇𝑇f(x^{\star})-f(x_{t^{\star}})\leq\mathcal{O}\left(\frac{\sqrt{\beta_{T}\gamma^% {ff^{\prime}}_{T}}}{\sqrt{T}}\right).italic_f ( italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) - italic_f ( italic_x start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) ≤ caligraphic_O ( divide start_ARG square-root start_ARG italic_β start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT italic_γ start_POSTSUPERSCRIPT italic_f italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG end_ARG start_ARG square-root start_ARG italic_T end_ARG end_ARG ) . (31)

Thm 5.4 highlights that by minimizing the known term 2(2B+λ1/2β(ϵ,δ2,t))σtff((xt,xt))22𝐵superscript𝜆12𝛽italic-ϵ𝛿2𝑡superscriptsubscript𝜎𝑡𝑓superscript𝑓subscript𝑥𝑡superscriptsubscript𝑥𝑡2\big{(}2B+\lambda^{-\nicefrac{{1}}{{2}}}\sqrt{\beta(\epsilon,\frac{\delta}{2}% ,t)}\big{)}\sigma_{t}^{ff^{\prime}}((x_{t},x_{t}^{\prime}))2 ( 2 italic_B + italic_λ start_POSTSUPERSCRIPT - / start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT square-root start_ARG italic_β ( italic_ϵ , divide start_ARG italic_δ end_ARG start_ARG 2 end_ARG , italic_t ) end_ARG ) italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_f italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ), the reported final solution xtsubscript𝑥superscript𝑡x_{t^{\star}}italic_x start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT has a guaranteed convergence rate.

5.2 Kernel-Specific Bounds and Rates

In this section, we show kernel-specific bounds for the regret and convergence rate for the reported solution. The explicit forms of the considered kernels are given in Appendix L.

Theorem 5.5 (Kernel-Specific Regret Bounds).

Setting ϵ=1/Titalic-ϵ1𝑇\epsilon=\nicefrac{{1}}{{T}}italic_ϵ = / start_ARG 1 end_ARG start_ARG italic_T end_ARG and running our POP-BO algorithm in Alg. 1,

  1. 1.

    If k(x,y)=x,y𝑘𝑥𝑦𝑥𝑦k(x,y)=\langle x,y\rangleitalic_k ( italic_x , italic_y ) = ⟨ italic_x , italic_y ⟩, we have,

    RT=𝒪(T3/4(logT)3/4).subscript𝑅𝑇𝒪superscript𝑇34superscript𝑇34R_{T}={\mathcal{O}}\left(T^{\nicefrac{{3}}{{4}}}(\log T)^{\nicefrac{{3}}{{4}}}% \right).italic_R start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = caligraphic_O ( italic_T start_POSTSUPERSCRIPT / start_ARG 3 end_ARG start_ARG 4 end_ARG end_POSTSUPERSCRIPT ( roman_log italic_T ) start_POSTSUPERSCRIPT / start_ARG 3 end_ARG start_ARG 4 end_ARG end_POSTSUPERSCRIPT ) . (32)
  2. 2.

    If k(x,y)𝑘𝑥𝑦k(x,y)italic_k ( italic_x , italic_y ) is a squared exponential kernel, we have,

    RT=𝒪(T3/4(logT)3/4(d+1)).subscript𝑅𝑇𝒪superscript𝑇34superscript𝑇34𝑑1R_{T}=\mathcal{O}\left(T^{\nicefrac{{3}}{{4}}}(\log T)^{\nicefrac{{3}}{{4}}(d+% 1)}\right).italic_R start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = caligraphic_O ( italic_T start_POSTSUPERSCRIPT / start_ARG 3 end_ARG start_ARG 4 end_ARG end_POSTSUPERSCRIPT ( roman_log italic_T ) start_POSTSUPERSCRIPT / start_ARG 3 end_ARG start_ARG 4 end_ARG ( italic_d + 1 ) end_POSTSUPERSCRIPT ) . (33)
  3. 3.

    If k(x,y)𝑘𝑥𝑦k(x,y)italic_k ( italic_x , italic_y ) is a Matérn kernel, we have,

    RT=𝒪(T3/4(logT)3/4Tdν(14+d+14+2(d+1)d/ν)),subscript𝑅𝑇𝒪superscript𝑇34superscript𝑇34superscript𝑇𝑑𝜈14𝑑142𝑑1𝑑𝜈R_{T}=\mathcal{O}\left(T^{\nicefrac{{3}}{{4}}}(\log T)^{\nicefrac{{3}}{{4}}}T^% {\frac{d}{\nu}\left(\frac{1}{4}+\frac{d+1}{4+2(d+1)\nicefrac{{d}}{{\nu}}}% \right)}\right),italic_R start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = caligraphic_O ( italic_T start_POSTSUPERSCRIPT / start_ARG 3 end_ARG start_ARG 4 end_ARG end_POSTSUPERSCRIPT ( roman_log italic_T ) start_POSTSUPERSCRIPT / start_ARG 3 end_ARG start_ARG 4 end_ARG end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT divide start_ARG italic_d end_ARG start_ARG italic_ν end_ARG ( divide start_ARG 1 end_ARG start_ARG 4 end_ARG + divide start_ARG italic_d + 1 end_ARG start_ARG 4 + 2 ( italic_d + 1 ) / start_ARG italic_d end_ARG start_ARG italic_ν end_ARG end_ARG ) end_POSTSUPERSCRIPT ) , (34)

    where ν𝜈\nuitalic_ν is the smooth parameter of the Matérn kernel that is assumed to be large enough such that ν>d4(3+d+d2+14d+17)=Θ(d2)𝜈𝑑43𝑑superscript𝑑214𝑑17Θsuperscript𝑑2\nu>\frac{d}{4}(3+d+\sqrt{d^{2}+14d+17})=\Theta(d^{2})italic_ν > divide start_ARG italic_d end_ARG start_ARG 4 end_ARG ( 3 + italic_d + square-root start_ARG italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 14 italic_d + 17 end_ARG ) = roman_Θ ( italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ).

Remark 5.6 (Comparison to GP-UCB with Scalar Feedback).

Interestingly, as compared to the kernel-specific bounds in the scalar evaluation-based optimization (Fig. 1 in (Srinivas et al., 2012)), the regret bound of preferential Bayesian optimization approximately has an additional factor of T1/4superscript𝑇14T^{\nicefrac{{1}}{{4}}}italic_T start_POSTSUPERSCRIPT / start_ARG 1 end_ARG start_ARG 4 end_ARG end_POSTSUPERSCRIPT. This is reasonable since intuitively, scalar evaluation can imply preference, but not vice versa. Therefore, preference feedback contains less information and thus may suffer from higher regret. Fig. 2 in Sec. 6.1 and Fig. 4 in Appendix N empirically verify our bounds here.

We then derive the kernel-specific convergence rates for the reported solution xtsubscript𝑥superscript𝑡x_{t^{\star}}italic_x start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT, as shown in Tab. 3 in the Appendix O.

6 Experimental Results

In this section, we compare our method to the state-of-the-art preferential BO methods on sampled instances from Gaussian process, standard test functions, and a thermal comfort optimization problem. The comparison outcome is sampled as assumed in Assump. 2.5. We implement our algorithm based on the Gaussian process package GPy (GPy, since 2012). The optimization problems for MLE and generating new samples are formulated and solved using CasADi (Andersson et al., 2019) and Ipopt (Wächter & Biegler, 2006). We compare our methods to three baseline methods: dueling Thompson sampling (González et al., 2017), skew-GP based preferential BO (Takeno et al., 2023), and the qEUBO (Astudillo et al., 2023). The dueling Thompson sampling method (González et al., 2017) derives the next pair to compare by maximizing the soft-Copeland’s score. The skew-GP based method (Takeno et al., 2023) applies standard BO algorithms conditioned on the Thompson sampling results on the historical sample points that are consistent with the historical preference feedbacks. The qEUBO (Astudillo et al., 2023) method uses the expected utility of the best option as an acquisition function. More experimental details and results on thermal comfort optimization are put in the Appendix P.

6.1 Sampled Instances from Gaussian Process

In this section, we sample the black-box function f𝑓fitalic_f from a Gaussian process with the squared exponential kernel as shown in Appendix L where the variance parameter is 9.09.09.09.0 and the lengthscale is 1.01.01.01.0. We sampled 30303030 instances in total.

Fig. 2 shows the performance comparisons with baselines. Our method achieves the lowest sublinear growth in cumulative regret. It also achieves better/competitive convergence speed for the reported solution as compared to the DTS method, while outperforming the SGP.

However, our method only uses less than 10%percent1010\%10 % of the computation time as compared to the DTS as shown in Tab. 1. The SGP method gets stuck in local optimum because it overly trusts the random preference feedback (hard constraint when doing Thompson sampling). Although the qEUBO method performs slightly better in the reported solution, it suffers from more than 2.52.52.52.5 times the cumulative regret as compared to ours. Similar to qEUBO (reporting posterior mean maximizer), we can report the maximizer of the minimum-norm f^tMLEsuperscriptsubscript^𝑓𝑡MLE\hat{f}_{t}^{\textrm{MLE}}over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT MLE end_POSTSUPERSCRIPT (POP-BO max-MLE in Fig. 2) instead of xtsubscript𝑥superscript𝑡x_{t^{\star}}italic_x start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT in Eq. (20), and achieves faster convergence than qEUBO.

Refer to caption
Figure 2: Cumulative regret and the suboptimality of reported solution, where the shaded areas represent ±0.1 standard deviationplus-or-minus0.1 standard deviation\pm 0.1\textsf{ standard deviation}± 0.1 standard deviation. qEUBO represents the method in (Astudillo et al., 2023), which reports the solution that maximizes the expected objective value conditioned on the historical samples. SGP represents the skew-GP based method (Takeno et al., 2023), which reports the first point of the duel proposed by the algorithm in the last step. DTS represents the duelling Thompson sampling method in (González et al., 2017), which reports the Condorcet winner.
Table 1: Computation time normalized against the DTS method.
DTS qEUBO SGP POP-BO (ours)
1.01.0\mathbf{1.0}bold_1.0 0.210.210.210.21 0.070.070.070.07 0.090.09{0.09}0.09

6.2 Test Function Optimization

In this section, we compare our method to several well-known global optimization test functions (Dixon, 1978; Molga & Smutnicki, 2005), which are divided by the standard deviation of samples over a grid. We run our method multiple times from different random initial points. Tab. 2 shows that POP-BO consistently finds better or comparable solutions as compared to other baselines.

Table 2: Suboptimality for the final reported solution after 30 steps. The results (mean±plus-or-minus\pm±standard deviation) are taken over 30 runs with random starting points.
Problem DTS qEUBO SGP POP-BO (ours)
Beale 0.84±0.52plus-or-minus0.840.520.84\pm 0.520.84 ± 0.52 0.15±0.52plus-or-minus0.150.520.15\pm 0.520.15 ± 0.52 0.10±0.19plus-or-minus0.100.190.10\pm 0.190.10 ± 0.19 0.008±0.025plus-or-minus0.0080.025\mathbf{0.008\pm 0.025}bold_0.008 ± bold_0.025
Branin 1.35±1.16plus-or-minus1.351.161.35\pm 1.161.35 ± 1.16 0.71±1.16plus-or-minus0.711.160.71\pm 1.160.71 ± 1.16 2.20±0.81plus-or-minus2.200.812.20\pm 0.812.20 ± 0.81 0.31±0.29plus-or-minus0.310.29\mathbf{0.31\pm 0.29}bold_0.31 ± bold_0.29
Bukin 1.45±1.13plus-or-minus1.451.131.45\pm 1.131.45 ± 1.13 0.59±1.20plus-or-minus0.591.200.59\pm 1.200.59 ± 1.20 1.27±0.80plus-or-minus1.270.801.27\pm 0.801.27 ± 0.80 0.92±0.54plus-or-minus0.920.54\mathbf{0.92\pm 0.54}bold_0.92 ± bold_0.54
Cross-in-Tray 1.56±1.39plus-or-minus1.561.391.56\pm 1.391.56 ± 1.39 2.03±1.82plus-or-minus2.031.822.03\pm 1.822.03 ± 1.82 1.79±1.49plus-or-minus1.791.491.79\pm 1.491.79 ± 1.49 1.38±0.97plus-or-minus1.380.97\mathbf{1.38\pm 0.97}bold_1.38 ± bold_0.97
Eggholder 3.08±0.55plus-or-minus3.080.553.08\pm 0.553.08 ± 0.55 3.11±0.55plus-or-minus3.110.553.11\pm 0.553.11 ± 0.55 1.87±0.94plus-or-minus1.870.941.87\pm 0.941.87 ± 0.94 1.83±0.96plus-or-minus1.830.96\mathbf{1.83\pm 0.96}bold_1.83 ± bold_0.96
Holder Table 3.21±1.38plus-or-minus3.211.383.21\pm 1.383.21 ± 1.38 3.20±1.38plus-or-minus3.201.383.20\pm 1.383.20 ± 1.38 1.56±1.62plus-or-minus1.561.621.56\pm 1.621.56 ± 1.62 1.22±1.01plus-or-minus1.221.01\mathbf{1.22\pm 1.01}bold_1.22 ± bold_1.01
Levy13 2.36±1.22plus-or-minus2.361.222.36\pm 1.222.36 ± 1.22 1.06±1.22plus-or-minus1.061.221.06\pm 1.221.06 ± 1.22 1.29±1.00plus-or-minus1.291.001.29\pm 1.001.29 ± 1.00 0.35±0.31plus-or-minus0.350.31\mathbf{0.35\pm 0.31}bold_0.35 ± bold_0.31

6.3 Scalability to Higher Dimension

To demonstrate the computational scalability of our joint optimization approach (as shown in Prob. (27)), we consider a set of higher dimensional problems. Due to space limitation, we show the results for the optimization of 12121212-dimensional black-box function sampled from a Gaussian process with squared exponential kernel function. More results can be found in Appendix P.1 and Appendix P.2.2. The optimization domain is set to be [0,10]12superscript01012[0,10]^{12}[ 0 , 10 ] start_POSTSUPERSCRIPT 12 end_POSTSUPERSCRIPT. We run 10101010 randomly sampled instances for 100100100100 steps. The average update time per step is only 18.018.018.018.0 seconds on a personal computer with one Intel64 Family 6 Model 142 Stepping 12 GenuineIntel  1803 Mhz processor and 16.0 GB RAM. This is comparably very small considering that each query to the comparison oracle can be very expensive in practice (e.g., heating the room up to a certain temperature to evaluate occupant comfort, which may take tens of minutes). We compare our method to the SGP baseline, which is one of the state-of-the-art computationally practical preferential Bayesian optimization method. Fig. 3 shows the cumulative regret (in log scale) and the suboptimality of the reported solution for the problem. It can be seen that our algorithm still achieves sublinear regret growth and good convergence for the suboptimality of the reported solution within 100100100100 steps in this 12-dimensional problem. Fig. 3 also shows that our POP-BO has faster convergence speed in higher dimensional problem and thus scales better than the SGP method.

Refer to caption
Figure 3: Cumulative regret in log scale and the suboptimality of reported solution in linear scale for a 12121212-dimensional problem sampled from Gaussian process. For reference purpose, we also plot T𝑇Titalic_T in the cumulative regret plot in log scale, where the shaded areas represent ±0.2 standard deviationplus-or-minus0.2 standard deviation\pm 0.2\textsf{ standard deviation}± 0.2 standard deviation.

7 Conclusion and Future Work

In this paper, we have presented a principled optimistic preferential BO algorithm, based on the likelihood-based confidence set. An efficient computational method is developed to implement the algorithm. We further show an information-theoretic bound on the cumulative regret, a first-of-its-kind for preferential BO. We also design a scheme to report an estimated optimal solution, with a guaranteed convergence rate. Experimental results show that our method achieves better or competitive performance as compared to the state-of-the-art heuristics, which, however, do not have theoretical guarantees on regret. Future works include the extension to the safety-critical problem (Berkenkamp et al., 2016; Guo et al., 2023) and game theoretical setting. The likelihood-based confidence set and the error bound in Sec. 3 can also be applied to more scenarios with preference feedback.

Acknowledgements

This research was supported by the Swiss National Science Foundation under NCCR Automation, grant agreement 51NF40_180545, the Swiss Federal Office of Energy SFOE as part of the SWEET consortium SWICE, and in part by the Swiss Data Science Center, grant agreement C20-13.

Impact Statement

This paper presents work whose goal is to advance the field of Machine Learning. There are many potential societal consequences of our work, none which we feel must be specifically highlighted here.

References

  • Abdelrahman & Miller (2022) Abdelrahman, M. M. and Miller, C. Targeting occupant feedback using digital twins: Adaptive spatial–temporal thermal preference sampling to optimize personal comfort models. Building and Environment, 218:109090, 2022.
  • Andersson et al. (2019) Andersson, J. A., Gillis, J., Horn, G., Rawlings, J. B., and Diehl, M. CasADi: a software framework for nonlinear optimization and optimal control. Mathematical Programming Computation, 11(1):1–36, 2019.
  • Astudillo et al. (2023) Astudillo, R., Lin, Z. J., Bakshy, E., and Frazier, P. qEUBO: A decision-theoretic acquisition function for preferential Bayesian optimization. In International Conference on Artificial Intelligence and Statistics, pp.  1093–1114. PMLR, 2023.
  • Berkenkamp et al. (2016) Berkenkamp, F., Schoellig, A. P., and Krause, A. Safe controller optimization for quadrotors with Gaussian processes. In 2016 IEEE international conference on robotics and automation (ICRA), pp.  491–496. IEEE, 2016.
  • Bradley & Terry (1952) Bradley, R. A. and Terry, M. E. Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika, 39(3/4):324–345, 1952.
  • Bull (2011) Bull, A. D. Convergence rates of efficient global optimization algorithms. Journal of Machine Learning Research, 12(10), 2011.
  • Chowdhury & Gopalan (2017a) Chowdhury, S. R. and Gopalan, A. On kernelized multi-armed bandits. In International Conference on Machine Learning, pp.  844–853. PMLR, 2017a.
  • Chowdhury & Gopalan (2017b) Chowdhury, S. R. and Gopalan, A. On kernelized multi-armed bandits. arXiv preprint arXiv:1704.00445, 2017b.
  • Christiano et al. (2017) Christiano, P. F., Leike, J., Brown, T., Martic, M., Legg, S., and Amodei, D. Deep reinforcement learning from human preferences. Advances in Neural Information Processing Systems, 30, 2017.
  • Christmann & Hable (2012) Christmann, A. and Hable, R. Consistency of support vector machines using additive kernels for additive models. Computational Statistics & Data Analysis, 56(4):854–873, 2012.
  • Curi et al. (2020) Curi, S., Berkenkamp, F., and Krause, A. Efficient model-based reinforcement learning through optimistic policy search and planning. Advances in Neural Information Processing Systems, 33:14156–14170, 2020.
  • Dixon (1978) Dixon, L. C. W. The global optimization problem: an introduction. Towards Global Optimiation 2, pp.  1–15, 1978.
  • Dudík et al. (2015) Dudík, M., Hofmann, K., Schapire, R. E., Slivkins, A., and Zoghi, M. Contextual dueling bandits. In Conference on Learning Theory, pp.  563–587. PMLR, 2015.
  • Edmunds & Triebel (1996) Edmunds, D. E. and Triebel, H. Function spaces, entropy numbers, differential operators, volume 120. Cambridge Univ Pr, 1996.
  • Emmenegger et al. (2023) Emmenegger, N., Mutny, M., and Krause, A. Likelihood ratio confidence sets for sequential decision making. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  • Fanger et al. (1970) Fanger, P. O. et al. Thermal comfort. analysis and applications in environmental engineering. Thermal comfort. Analysis and applications in environmental engineering., 1970.
  • Frazier (2018) Frazier, P. I. A tutorial on Bayesian optimization. arXiv preprint arXiv:1807.02811, 2018.
  • Gajane et al. (2015) Gajane, P., Urvoy, T., and Clérot, F. A relative exponential weighing algorithm for adversarial utility-based dueling bandits. In International Conference on Machine Learning, pp.  218–227. PMLR, 2015.
  • González et al. (2017) González, J., Dai, Z., Damianou, A., and Lawrence, N. D. Preferential Bayesian optimization. In International Conference on Machine Learning, pp.  1282–1291. PMLR, 2017.
  • GPy (since 2012) GPy. GPy: A Gaussian process framework in python. http://github.com/SheffieldML/GPy, since 2012.
  • Griffith et al. (2013) Griffith, S., Subramanian, K., Scholz, J., Isbell, C. L., and Thomaz, A. L. Policy shaping: Integrating human feedback with reinforcement learning. Advances in Neural Information Processing Systems, 26, 2013.
  • Guo et al. (2023) Guo, B., Jiang, Y., Kamgarpour, M., and Ferrari-Trecate, G. Safe zeroth-order convex optimization using quadratic local approximations. In 2023 European Control Conference (ECC), pp.  1–8. IEEE, 2023.
  • Hiranaka et al. (2023) Hiranaka, A., Hwang, M., Lee, S., Wang, C., Fei-Fei, L., Wu, J., and Zhang, R. Primitive skill-based robot learning from human evaluative feedback. In 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp.  7817–7824. IEEE, 2023.
  • Jones et al. (1998) Jones, D. R., Schonlau, M., and Welch, W. J. Efficient global optimization of expensive black-box functions. Journal of Global Optimization, 13(4):455–492, 1998.
  • Kahneman & Tversky (2013) Kahneman, D. and Tversky, A. Prospect theory: An analysis of decision under risk. In Handbook of the Fundamentals of Financial Decision Making: Part I, pp.  99–127. World Scientific, 2013.
  • Kandasamy et al. (2015) Kandasamy, K., Schneider, J., and Póczos, B. High dimensional Bayesian optimisation and bandits via additive models. In International Conference on Machine Learning, pp.  295–304. PMLR, 2015.
  • Koyama et al. (2020) Koyama, Y., Sato, I., and Goto, M. Sequential gallery for interactive visual design optimization. ACM Transactions on Graphics (TOG), 39(4):88–1, 2020.
  • Lalley (2013) Lalley, S. P. Concentration inequalities. Lecture notes, University of Chicago, 2013.
  • Li et al. (2021) Li, K., Tucker, M., Bıyık, E., Novoseller, E., Burdick, J. W., Sui, Y., Sadigh, D., Yue, Y., and Ames, A. D. ROIAL: Region of interest active learning for characterizing exoskeleton gait preference landscapes. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pp.  3212–3218. IEEE, 2021.
  • Lichtenstein & Slovic (1971) Lichtenstein, S. and Slovic, P. Reversals of preference between bids and choices in gambling decisions. Journal of experimental psychology, 89(1):46, 1971.
  • Liu et al. (2023) Liu, Q., Netrapalli, P., Szepesvari, C., and Jin, C. Optimistic MLE: A generic model-based algorithm for partially observable sequential decision making. In Proceedings of the 55th Annual ACM Symposium on Theory of Computing, pp.  363–376, 2023.
  • Lyu et al. (2023) Lyu, J., Shi, Y., Du, H., and Lian, Z. Sex-based thermal comfort zones and energy savings in spaces with joint operation of air conditioner and fan. Building and Environment, 246:111002, 2023. ISSN 0360-1323. doi: https://doi.org/10.1016/j.buildenv.2023.111002. URL https://www.sciencedirect.com/science/article/pii/S0360132323010296.
  • Maddalena et al. (2021) Maddalena, E. T., Scharnhorst, P., and Jones, C. N. Deterministic error bounds for kernel-based learning techniques under bounded noise. Automatica, 134:109896, 2021.
  • Mehta et al. (2023) Mehta, V., Neopane, O., Das, V., Lin, S., Schneider, J., and Neiswanger, W. Kernelized offline contextual dueling bandits. arXiv preprint arXiv:2307.11288, 2023.
  • Mikkola et al. (2020) Mikkola, P., Todorović, M., Järvi, J., Rinke, P., and Kaski, S. Projective preferential Bayesian optimization. In International Conference on Machine Learning, pp.  6884–6892. PMLR, 2020.
  • Molga & Smutnicki (2005) Molga, M. and Smutnicki, C. Test functions for optimization needs. Test functions for optimization needs, 101:48, 2005.
  • Negoescu et al. (2011) Negoescu, D. M., Frazier, P. I., and Powell, W. B. The knowledge-gradient algorithm for sequencing experiments in drug discovery. INFORMS Journal on Computing, 23(3):346–363, 2011.
  • Newey & McFadden (1994) Newey, W. K. and McFadden, D. Large sample estimation and hypothesis testing. Handbook of econometrics, 4:2111–2245, 1994.
  • Osband & Van Roy (2014) Osband, I. and Van Roy, B. Model-based reinforcement learning and the Eluder dimension. Advances in Neural Information Processing Systems, 27, 2014.
  • Ouyang et al. (2022) Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., et al. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
  • Owen (1990) Owen, A. Empirical likelihood ratio confidence regions. The Annals of Statistics, 18(1):90–120, 1990.
  • Pacchiano et al. (2021) Pacchiano, A., Ball, P., Parker-Holder, J., Choromanski, K., and Roberts, S. Towards tractable optimism in model-based reinforcement learning. In Uncertainty in Artificial Intelligence, pp.  1413–1423. PMLR, 2021.
  • Saha & Krishnamurthy (2022) Saha, A. and Krishnamurthy, A. Efficient and optimal algorithms for contextual dueling bandits under realizability. In International Conference on Algorithmic Learning Theory, pp.  968–994. PMLR, 2022.
  • Saha et al. (2021) Saha, A., Koren, T., and Mansour, Y. Dueling convex optimization. In International Conference on Machine Learning, pp.  9245–9254. PMLR, 2021.
  • Schölkopf et al. (2001) Schölkopf, B., Herbrich, R., and Smola, A. J. A generalized representer theorem. In International Conference on Computational Learning Theory, pp.  416–426. Springer, 2001.
  • Shahriari et al. (2015) Shahriari, B., Swersky, K., Wang, Z., Adams, R. P., and De Freitas, N. Taking the human out of the loop: A review of Bayesian optimization. Proceedings of the IEEE, 104(1):148–175, 2015.
  • Snoek et al. (2012) Snoek, J., Larochelle, H., and Adams, R. P. Practical Bayesian optimization of machine learning algorithms. Advances in Neural Inf. Process. Syst., 25, 2012.
  • Srinivas et al. (2012) Srinivas, N., Krause, A., Kakade, S. M., and Seeger, M. W. Information-theoretic regret bounds for Gaussian process optimization in the bandit setting. IEEE Transactions on Information Theory, 58(5):3250–3265, 2012.
  • Sui et al. (2017) Sui, Y., Zhuang, V., Burdick, J. W., and Yue, Y. Multi-dueling bandits with dependent arms. arXiv preprint arXiv:1705.00253, 2017.
  • Sui et al. (2018) Sui, Y., Burdick, J., Yue, Y., et al. Stage-wise safe Bayesian optimization with Gaussian processes. In Proc. of the Int. Conf. on Mach. Learn., pp.  4781–4789, 2018.
  • Takeno et al. (2023) Takeno, S., Nomura, M., and Karasuyama, M. Towards practical preferential Bayesian optimization with skew Gaussian processes. In Proceedings of the 40th International Conference on Machine Learning, volume 202, pp.  33516–33533, 2023.
  • Tversky & Kahneman (1974) Tversky, A. and Kahneman, D. Judgment under uncertainty: Heuristics and biases: Biases in judgments reveal some heuristics of thinking under uncertainty. Science, 185(4157):1124–1131, 1974.
  • Wächter & Biegler (2006) Wächter, A. and Biegler, L. T. On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Mathematical Programming, 106(1):25–57, 2006.
  • Wang et al. (2023) Wang, Y., Liu, Q., and Jin, C. Is RLHF more difficult than standard RL? A theoretical perspective. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  • Warnell et al. (2018) Warnell, G., Waytowich, N., Lawhern, V., and Stone, P. Deep TAMER: Interactive agent shaping in high-dimensional state spaces. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
  • Wendland (2004) Wendland, H. Scattered data approximation, volume 17. Cambridge university press, 2004.
  • Wu et al. (2022) Wu, C., Li, T., Zhang, Z., and Yu, Y. Bayesian optimistic optimization: Optimistic exploration for model-based reinforcement learning. Advances in Neural Information Processing Systems, 35:14210–14223, 2022.
  • Wu (2017) Wu, Y. Lecture notes on information-theoretic methods for high-dimensional statistics. Lecture Notes for ECE598YW (UIUC), 16, 2017.
  • Xu et al. (2022a) Xu, W., Jiang, Y., Maddalena, E. T., and Jones, C. N. Lower bounds on the worst-case complexity of efficient global optimization. arXiv preprint arXiv:2209.09655, 2022a.
  • Xu et al. (2022b) Xu, W., Jones, C. N., Svetozarevic, B., Laughman, C. R., and Chakrabarty, A. VABO: Violation-aware Bayesian optimization for closed-loop control performance optimization with unmodeled constraints. In 2022 American Control Conference (ACC), pp.  5288–5293. IEEE, 2022b.
  • Xu et al. (2023) Xu, W., Jiang, Y., Svetozarevic, B., and Jones, C. Constrained efficient global optimization of expensive black-box functions. In International Conference on Machine Learning, pp.  38485–38498. PMLR, 2023.
  • Yue & Joachims (2009) Yue, Y. and Joachims, T. Interactively optimizing information retrieval systems as a dueling bandits problem. In Proceedings of the 26th Annual International Conference on Machine Learning, pp.  1201–1208, 2009.
  • Yue et al. (2012) Yue, Y., Broder, J., Kleinberg, R., and Joachims, T. The k-armed dueling bandits problem. Journal of Computer and System Sciences, 78(5):1538–1556, 2012.
  • Zhang et al. (2024) Zhang, H., Lee, S., and Tzempelikos, A. Bayesian meta-learning for personalized thermal comfort modeling. Building and Environment, 249:111129, February 2024. ISSN 03601323. doi: 10.1016/j.buildenv.2023.111129. URL https://linkinghub.elsevier.com/retrieve/pii/S0360132323011563.
  • Zhou (2002) Zhou, D.-X. The covering number in learning theory. Journal of Complexity, 18(3):739–767, 2002.
  • Zhou & Ji (2022) Zhou, X. and Ji, B. On kernelized multi-armed bandits with constraints. Advances in Neural Information Processing Systems, 35, 2022.
  • Zhu et al. (2023) Zhu, B., Jordan, M., and Jiao, J. Principled reinforcement learning with human feedback from pairwise or k-wise comparisons. In Proceedings of the 40th International Conference on Machine Learning, volume 202, pp.  43037–43067, 23–29 Jul 2023.

Without further notice, all the results shown in this appendix are under the assumptions 2.1, 2.2, 2.4, and 2.5.

Appendix A Preliminaries

To prepare for the proofs of the main results shown in this paper, we first state several useful lemmas.

Lemma A.1.

The function ψ(y,y)=log(ey+ey)𝜓𝑦superscript𝑦superscript𝑒𝑦superscript𝑒superscript𝑦\psi(y,y^{\prime})=\log(e^{y}+e^{y^{\prime}})italic_ψ ( italic_y , italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = roman_log ( italic_e start_POSTSUPERSCRIPT italic_y end_POSTSUPERSCRIPT + italic_e start_POSTSUPERSCRIPT italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ) is convex in (y,y)𝑦superscript𝑦(y,y^{\prime})( italic_y , italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ).

Proof.

We calculate the Hessian of the function ψ𝜓\psiitalic_ψ and derive

2ψ=ey+y(ey+ey)2[1111]0.superscript2𝜓superscript𝑒𝑦superscript𝑦superscriptsuperscript𝑒𝑦superscript𝑒superscript𝑦2delimited-[]11missing-subexpressionmissing-subexpression11missing-subexpressionmissing-subexpressionsucceeds-or-equals0\displaystyle\nabla^{2}\psi=\frac{e^{y+y^{\prime}}}{(e^{y}+e^{y^{\prime}})^{2}% }\left[\begin{array}[]{cccc}1&-1\\ -1&1\\ \end{array}\right]\succcurlyeq 0.∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ψ = divide start_ARG italic_e start_POSTSUPERSCRIPT italic_y + italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG ( italic_e start_POSTSUPERSCRIPT italic_y end_POSTSUPERSCRIPT + italic_e start_POSTSUPERSCRIPT italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG [ start_ARRAY start_ROW start_CELL 1 end_CELL start_CELL - 1 end_CELL start_CELL end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL - 1 end_CELL start_CELL 1 end_CELL start_CELL end_CELL start_CELL end_CELL end_ROW end_ARRAY ] ≽ 0 . (37)

Hence, ψ𝜓\psiitalic_ψ is convex.

Therefore, we can see (Z0:t|𝒟t)conditionalsubscript𝑍:0𝑡subscript𝒟𝑡\ell(Z_{0:t}|\mathcal{D}_{t})roman_ℓ ( italic_Z start_POSTSUBSCRIPT 0 : italic_t end_POSTSUBSCRIPT | caligraphic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) is concave in Z0:tsubscript𝑍:0𝑡Z_{0:t}italic_Z start_POSTSUBSCRIPT 0 : italic_t end_POSTSUBSCRIPT.

Lemma A.2.

f~f,x𝒳,f~(x)[B,B].formulae-sequencefor-all~𝑓subscript𝑓formulae-sequence𝑥𝒳~𝑓𝑥𝐵𝐵\forall\tilde{f}\in\mathcal{B}_{f},x\in\mathcal{X},\tilde{f}(x)\in[-B,B].∀ over~ start_ARG italic_f end_ARG ∈ caligraphic_B start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT , italic_x ∈ caligraphic_X , over~ start_ARG italic_f end_ARG ( italic_x ) ∈ [ - italic_B , italic_B ] .

Proof.

|f~(x)|=|f~,k(x,)|f~k(x,)Bk(x,x)B~𝑓𝑥~𝑓𝑘𝑥delimited-∥∥~𝑓delimited-∥∥𝑘𝑥𝐵𝑘𝑥𝑥𝐵|\tilde{f}(x)|=|\langle\tilde{f},k(x,\cdot)\rangle|\leq\left\lVert\tilde{f}% \right\rVert\left\lVert k(x,\cdot)\right\rVert\leq B\sqrt{k(x,x)}\leq B| over~ start_ARG italic_f end_ARG ( italic_x ) | = | ⟨ over~ start_ARG italic_f end_ARG , italic_k ( italic_x , ⋅ ) ⟩ | ≤ ∥ over~ start_ARG italic_f end_ARG ∥ ∥ italic_k ( italic_x , ⋅ ) ∥ ≤ italic_B square-root start_ARG italic_k ( italic_x , italic_x ) end_ARG ≤ italic_B, where the first inequality follows by Cauchy–Schwarz inequality, the second inequality follows by Assump. 2.2, and the last inequality follows by Assump. 2.4. ∎

Appendix B Properties of the function σ()𝜎\sigma(\cdot)italic_σ ( ⋅ )

When applying the function σ𝜎\sigmaitalic_σ to the difference of objective function f~(x)f~(x),f~f~𝑓𝑥~𝑓superscript𝑥for-all~𝑓subscript𝑓\tilde{f}(x)-\tilde{f}(x^{\prime}),\forall\tilde{f}\in\mathcal{B}_{f}over~ start_ARG italic_f end_ARG ( italic_x ) - over~ start_ARG italic_f end_ARG ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) , ∀ over~ start_ARG italic_f end_ARG ∈ caligraphic_B start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT, we have the calculations by single variable calculus,

u𝑢\displaystyle uitalic_u :=f~(x)f~(x)[2B,2B],assignabsent~𝑓𝑥~𝑓superscript𝑥2𝐵2𝐵\displaystyle\vcentcolon=\tilde{f}(x)-\tilde{f}(x^{\prime})\in[-2B,2B],:= over~ start_ARG italic_f end_ARG ( italic_x ) - over~ start_ARG italic_f end_ARG ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ∈ [ - 2 italic_B , 2 italic_B ] ,
σ(u)𝜎𝑢\displaystyle\sigma(u)italic_σ ( italic_u ) [σ¯,σ¯],absent¯𝜎¯𝜎\displaystyle\in[\underline{\sigma},\bar{\sigma}],∈ [ under¯ start_ARG italic_σ end_ARG , over¯ start_ARG italic_σ end_ARG ] ,
σ(u)superscript𝜎𝑢\displaystyle\sigma^{\prime}(u)italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_u ) =12+eu+eu[σ¯,σ¯],absent12superscript𝑒𝑢superscript𝑒𝑢¯superscript𝜎¯superscript𝜎\displaystyle=\frac{1}{2+e^{u}+e^{-u}}\in[\underline{\sigma^{\prime}},\bar{% \sigma^{\prime}}],= divide start_ARG 1 end_ARG start_ARG 2 + italic_e start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT + italic_e start_POSTSUPERSCRIPT - italic_u end_POSTSUPERSCRIPT end_ARG ∈ [ under¯ start_ARG italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG , over¯ start_ARG italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG ] ,

where σ¯=1/(1+e2B),σ¯=1/(1+e2B)formulae-sequence¯𝜎11superscript𝑒2𝐵¯𝜎11superscript𝑒2𝐵\underline{\sigma}=\nicefrac{{1}}{{(1+e^{2B})}},\bar{\sigma}=\nicefrac{{1}}{{(% 1+e^{-2B})}}under¯ start_ARG italic_σ end_ARG = / start_ARG 1 end_ARG start_ARG ( 1 + italic_e start_POSTSUPERSCRIPT 2 italic_B end_POSTSUPERSCRIPT ) end_ARG , over¯ start_ARG italic_σ end_ARG = / start_ARG 1 end_ARG start_ARG ( 1 + italic_e start_POSTSUPERSCRIPT - 2 italic_B end_POSTSUPERSCRIPT ) end_ARG and σ¯=1/(2+e2B+e2B),σ¯=1/4formulae-sequence¯superscript𝜎12superscript𝑒2𝐵superscript𝑒2𝐵¯superscript𝜎14\underline{\sigma^{\prime}}=\nicefrac{{1}}{{(2+e^{2B}+e^{-2B})}},\bar{\sigma^{% \prime}}=\nicefrac{{1}}{{4}}under¯ start_ARG italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG = / start_ARG 1 end_ARG start_ARG ( 2 + italic_e start_POSTSUPERSCRIPT 2 italic_B end_POSTSUPERSCRIPT + italic_e start_POSTSUPERSCRIPT - 2 italic_B end_POSTSUPERSCRIPT ) end_ARG , over¯ start_ARG italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG = / start_ARG 1 end_ARG start_ARG 4 end_ARG. We also introduce some constants Bp=σ¯σ¯σ¯σ¯subscript𝐵𝑝¯𝜎¯𝜎¯𝜎¯𝜎B_{p}=\frac{\bar{\sigma}}{\underline{\sigma}}-\frac{\underline{\sigma}}{\bar{% \sigma}}italic_B start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT = divide start_ARG over¯ start_ARG italic_σ end_ARG end_ARG start_ARG under¯ start_ARG italic_σ end_ARG end_ARG - divide start_ARG under¯ start_ARG italic_σ end_ARG end_ARG start_ARG over¯ start_ARG italic_σ end_ARG end_ARG, Hσ=12σ¯2subscript𝐻𝜎12superscript¯𝜎2H_{\sigma}=\frac{1}{2\bar{\sigma}^{2}}italic_H start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG 2 over¯ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG and CL=1+21+e2Bsubscript𝐶𝐿121superscript𝑒2𝐵C_{L}=1+\frac{{2}}{1+e^{-2B}}italic_C start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT = 1 + divide start_ARG 2 end_ARG start_ARG 1 + italic_e start_POSTSUPERSCRIPT - 2 italic_B end_POSTSUPERSCRIPT end_ARG, which will be used in the proof.

Appendix C Proof of Thm. 3.1

To prepare for the proof of the theorem, we first prove several lemmas.

Lemma C.1.

For any fixed f^f^𝑓subscript𝑓\hat{f}\in\mathcal{B}_{f}over^ start_ARG italic_f end_ARG ∈ caligraphic_B start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT, we have,

(logf^((xτ,xτ,𝟏τ)τ=1t)logf((xτ,xτ,𝟏τ)τ=1t)32tB2log1δt)1δt,subscript^𝑓superscriptsubscriptsubscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏𝜏1𝑡subscript𝑓superscriptsubscriptsubscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏𝜏1𝑡32𝑡superscript𝐵21subscript𝛿𝑡1subscript𝛿𝑡\mathbb{P}\left(\log\mathbb{P}_{\hat{f}}((x_{\tau},x_{\tau}^{\prime},\mathbf{1% }_{\tau})_{\tau=1}^{t})-\log\mathbb{P}_{{f}}((x_{\tau},x_{\tau}^{\prime},% \mathbf{1}_{\tau})_{\tau=1}^{t})\leq\sqrt{32tB^{2}\log\frac{1}{\delta_{t}}}% \right)\geq 1-\delta_{t},blackboard_P ( roman_log blackboard_P start_POSTSUBSCRIPT over^ start_ARG italic_f end_ARG end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - roman_log blackboard_P start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ≤ square-root start_ARG 32 italic_t italic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log divide start_ARG 1 end_ARG start_ARG italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG end_ARG ) ≥ 1 - italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , (38)

where f𝑓fitalic_f is the ground-truth function.

Proof.

We use yτsubscript𝑦𝜏y_{\tau}italic_y start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT (yτsuperscriptsubscript𝑦𝜏y_{\tau}^{\prime}italic_y start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT resp.) to denote f(xτ)𝑓subscript𝑥𝜏f(x_{\tau})italic_f ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) (f(xτ)𝑓superscriptsubscript𝑥𝜏f(x_{\tau}^{\prime})italic_f ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) resp.). We use zτsubscript𝑧𝜏z_{\tau}italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT (zτsuperscriptsubscript𝑧𝜏z_{\tau}^{\prime}italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT resp.) to denote f^(xτ)^𝑓subscript𝑥𝜏\hat{f}(x_{\tau})over^ start_ARG italic_f end_ARG ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) (f^(xτ)^𝑓superscriptsubscript𝑥𝜏\hat{f}(x_{\tau}^{\prime})over^ start_ARG italic_f end_ARG ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) resp.). And we use pτsubscript𝑝𝜏p_{\tau}italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT to denote σ(yτyτ)𝜎subscript𝑦𝜏superscriptsubscript𝑦𝜏\sigma(y_{\tau}-y_{\tau}^{\prime})italic_σ ( italic_y start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ).

(logf^((xτ,xτ,𝟏τ)τ=1t)logf((xτ,xτ,𝟏τ)τ=1t)ξ)subscript^𝑓superscriptsubscriptsubscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏𝜏1𝑡subscript𝑓superscriptsubscriptsubscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏𝜏1𝑡𝜉\displaystyle\mathbb{P}\left(\log\mathbb{P}_{\hat{f}}((x_{\tau},x_{\tau}^{% \prime},\mathbf{1}_{\tau})_{\tau=1}^{t})-\log\mathbb{P}_{{f}}((x_{\tau},x_{% \tau}^{\prime},\mathbf{1}_{\tau})_{\tau=1}^{t})\leq\xi\right)blackboard_P ( roman_log blackboard_P start_POSTSUBSCRIPT over^ start_ARG italic_f end_ARG end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - roman_log blackboard_P start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ≤ italic_ξ )
=\displaystyle== (τ=1t((zτyτ)𝟏τ+(zτyτ)(1𝟏τ))τ=1tlog(ezτ+ezτ)+τ=1tlog(eyτ+eyτ)ξ)superscriptsubscript𝜏1𝑡subscript𝑧𝜏subscript𝑦𝜏subscript1𝜏superscriptsubscript𝑧𝜏superscriptsubscript𝑦𝜏1subscript1𝜏superscriptsubscript𝜏1𝑡superscript𝑒subscript𝑧𝜏superscript𝑒superscriptsubscript𝑧𝜏superscriptsubscript𝜏1𝑡superscript𝑒subscript𝑦𝜏superscript𝑒superscriptsubscript𝑦𝜏𝜉\displaystyle\mathbb{P}\left(\sum_{\tau=1}^{t}\left((z_{\tau}-y_{\tau})\mathbf% {1}_{\tau}+(z_{\tau}^{\prime}-y_{\tau}^{\prime})(1-\mathbf{1}_{\tau})\right)-% \sum_{\tau=1}^{t}\log\left(e^{z_{\tau}}+e^{z_{\tau}^{\prime}}\right)+\sum_{% \tau=1}^{t}\log\left(e^{y_{\tau}}+e^{y_{\tau}^{\prime}}\right)\leq\xi\right)blackboard_P ( ∑ start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( ( italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT + ( italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_y start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ( 1 - bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) ) - ∑ start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT roman_log ( italic_e start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT + italic_e start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ) + ∑ start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT roman_log ( italic_e start_POSTSUPERSCRIPT italic_y start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT + italic_e start_POSTSUPERSCRIPT italic_y start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ) ≤ italic_ξ )
=\displaystyle== (τ=1t((zτyτ)𝟏τ+(zτyτ)(1𝟏τ))τ=1t((zτyτ)pτ+(zτyτ)(1pτ))ξ)superscriptsubscript𝜏1𝑡subscript𝑧𝜏subscript𝑦𝜏subscript1𝜏superscriptsubscript𝑧𝜏superscriptsubscript𝑦𝜏1subscript1𝜏superscriptsubscript𝜏1𝑡subscript𝑧𝜏subscript𝑦𝜏subscript𝑝𝜏superscriptsubscript𝑧𝜏superscriptsubscript𝑦𝜏1subscript𝑝𝜏superscript𝜉\displaystyle\mathbb{P}\left(\sum_{\tau=1}^{t}\left((z_{\tau}-y_{\tau})\mathbf% {1}_{\tau}+(z_{\tau}^{\prime}-y_{\tau}^{\prime})(1-\mathbf{1}_{\tau})\right)-% \sum_{\tau=1}^{t}\left((z_{\tau}-y_{\tau})p_{\tau}+(z_{\tau}^{\prime}-y_{\tau}% ^{\prime})(1-p_{\tau})\right)\leq\xi^{\prime}\right)blackboard_P ( ∑ start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( ( italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT + ( italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_y start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ( 1 - bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) ) - ∑ start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( ( italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT + ( italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_y start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ( 1 - italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) ) ≤ italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT )

where ξ=ξ+τ=1tlog(ezτ+ezτ)τ=1tlog(eyτ+eyτ)τ=1t((zτyτ)pτ+(zτyτ)(1pτ))superscript𝜉𝜉superscriptsubscript𝜏1𝑡superscript𝑒subscript𝑧𝜏superscript𝑒superscriptsubscript𝑧𝜏superscriptsubscript𝜏1𝑡superscript𝑒subscript𝑦𝜏superscript𝑒superscriptsubscript𝑦𝜏superscriptsubscript𝜏1𝑡subscript𝑧𝜏subscript𝑦𝜏subscript𝑝𝜏superscriptsubscript𝑧𝜏superscriptsubscript𝑦𝜏1subscript𝑝𝜏\xi^{\prime}=\xi+\sum_{\tau=1}^{t}\log\left(e^{z_{\tau}}+e^{z_{\tau}^{\prime}}% \right)-\sum_{\tau=1}^{t}\log\left(e^{y_{\tau}}+e^{y_{\tau}^{\prime}}\right)-% \sum_{\tau=1}^{t}\left((z_{\tau}-y_{\tau})p_{\tau}+(z_{\tau}^{\prime}-y_{\tau}% ^{\prime})(1-p_{\tau})\right)italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_ξ + ∑ start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT roman_log ( italic_e start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT + italic_e start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ) - ∑ start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT roman_log ( italic_e start_POSTSUPERSCRIPT italic_y start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT + italic_e start_POSTSUPERSCRIPT italic_y start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ) - ∑ start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( ( italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT + ( italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_y start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ( 1 - italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) ), and the probability \mathbb{P}blackboard_P is taken with respect to the randomness from the comparison oracle and the randomness from the algorithm.

It can be checked that ψτ(y,y):=log(ey+ey)pτy(1pτ)yassignsubscript𝜓𝜏𝑦superscript𝑦superscript𝑒𝑦superscript𝑒superscript𝑦subscript𝑝𝜏𝑦1subscript𝑝𝜏superscript𝑦\psi_{\tau}(y,y^{\prime})\vcentcolon=\log\left(e^{y}+e^{y^{\prime}}\right)-p_{% \tau}y-(1-p_{\tau})y^{\prime}italic_ψ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( italic_y , italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) := roman_log ( italic_e start_POSTSUPERSCRIPT italic_y end_POSTSUPERSCRIPT + italic_e start_POSTSUPERSCRIPT italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ) - italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT italic_y - ( 1 - italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is a convex function and ψτ(yτ,yτ)=0subscript𝜓𝜏subscript𝑦𝜏superscriptsubscript𝑦𝜏0\nabla\psi_{\tau}(y_{\tau},y_{\tau}^{\prime})=0∇ italic_ψ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = 0. This implies that (yτ,yτ)subscript𝑦𝜏superscriptsubscript𝑦𝜏(y_{\tau},y_{\tau}^{\prime})( italic_y start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) achieves the minimum for the convex function ψτsubscript𝜓𝜏\psi_{\tau}italic_ψ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT. Therefore,

log(eyτ+eyτ)pτyτ(1pτ)yτlog(ezτ+ezτ)pτzτ(1pτ)zτ.superscript𝑒subscript𝑦𝜏superscript𝑒superscriptsubscript𝑦𝜏subscript𝑝𝜏subscript𝑦𝜏1subscript𝑝𝜏superscriptsubscript𝑦𝜏superscript𝑒subscript𝑧𝜏superscript𝑒superscriptsubscript𝑧𝜏subscript𝑝𝜏subscript𝑧𝜏1subscript𝑝𝜏superscriptsubscript𝑧𝜏\log\left(e^{y_{\tau}}+e^{y_{\tau}^{\prime}}\right)-p_{\tau}y_{\tau}-(1-p_{% \tau})y_{\tau}^{\prime}\leq\log\left(e^{z_{\tau}}+e^{z_{\tau}^{\prime}}\right)% -p_{\tau}z_{\tau}-(1-p_{\tau})z_{\tau}^{\prime}.roman_log ( italic_e start_POSTSUPERSCRIPT italic_y start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT + italic_e start_POSTSUPERSCRIPT italic_y start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ) - italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - ( 1 - italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) italic_y start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ≤ roman_log ( italic_e start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT + italic_e start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ) - italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - ( 1 - italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT .

Rearrangement gives,

log(ezτ+ezτ)log(eyτ+eyτ)((zτyτ)pτ+(zτyτ)(1pτ))0.superscript𝑒subscript𝑧𝜏superscript𝑒superscriptsubscript𝑧𝜏superscript𝑒subscript𝑦𝜏superscript𝑒superscriptsubscript𝑦𝜏subscript𝑧𝜏subscript𝑦𝜏subscript𝑝𝜏superscriptsubscript𝑧𝜏superscriptsubscript𝑦𝜏1subscript𝑝𝜏0\log\left(e^{z_{\tau}}+e^{z_{\tau}^{\prime}}\right)-\log\left(e^{y_{\tau}}+e^{% y_{\tau}^{\prime}}\right)-\left((z_{\tau}-y_{\tau})p_{\tau}+(z_{\tau}^{\prime}% -y_{\tau}^{\prime})(1-p_{\tau})\right)\geq 0.roman_log ( italic_e start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT + italic_e start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ) - roman_log ( italic_e start_POSTSUPERSCRIPT italic_y start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT + italic_e start_POSTSUPERSCRIPT italic_y start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ) - ( ( italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT + ( italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_y start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ( 1 - italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) ) ≥ 0 .

Hence, ξξsuperscript𝜉𝜉\xi^{\prime}\geq\xiitalic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ≥ italic_ξ. Therefore,

(logf^((xτ,xτ,𝟏τ)τ=1t)logf((xτ,xτ,𝟏τ)τ=1t)ξ)subscript^𝑓superscriptsubscriptsubscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏𝜏1𝑡subscript𝑓superscriptsubscriptsubscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏𝜏1𝑡𝜉\displaystyle\mathbb{P}\left(\log\mathbb{P}_{\hat{f}}((x_{\tau},x_{\tau}^{% \prime},\mathbf{1}_{\tau})_{\tau=1}^{t})-\log\mathbb{P}_{{f}}((x_{\tau},x_{% \tau}^{\prime},\mathbf{1}_{\tau})_{\tau=1}^{t})\leq\xi\right)blackboard_P ( roman_log blackboard_P start_POSTSUBSCRIPT over^ start_ARG italic_f end_ARG end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - roman_log blackboard_P start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ≤ italic_ξ )
=\displaystyle== (τ=1t((zτyτ)𝟏τ+(zτyτ)(1𝟏τ))τ=1t((zτyτ)pτ+(zτyτ)(1pτ))ξ)superscriptsubscript𝜏1𝑡subscript𝑧𝜏subscript𝑦𝜏subscript1𝜏superscriptsubscript𝑧𝜏superscriptsubscript𝑦𝜏1subscript1𝜏superscriptsubscript𝜏1𝑡subscript𝑧𝜏subscript𝑦𝜏subscript𝑝𝜏superscriptsubscript𝑧𝜏superscriptsubscript𝑦𝜏1subscript𝑝𝜏superscript𝜉\displaystyle\mathbb{P}\left(\sum_{\tau=1}^{t}\left((z_{\tau}-y_{\tau})\mathbf% {1}_{\tau}+(z_{\tau}^{\prime}-y_{\tau}^{\prime})(1-\mathbf{1}_{\tau})\right)-% \sum_{\tau=1}^{t}\left((z_{\tau}-y_{\tau})p_{\tau}+(z_{\tau}^{\prime}-y_{\tau}% ^{\prime})(1-p_{\tau})\right)\leq\xi^{\prime}\right)blackboard_P ( ∑ start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( ( italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT + ( italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_y start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ( 1 - bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) ) - ∑ start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( ( italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT + ( italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_y start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ( 1 - italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) ) ≤ italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT )
\displaystyle\geq (τ=1t((zτyτ)𝟏τ+(zτyτ)(1𝟏τ))τ=1t((zτyτ)pτ+(zτyτ)(1pτ))ξ)superscriptsubscript𝜏1𝑡subscript𝑧𝜏subscript𝑦𝜏subscript1𝜏superscriptsubscript𝑧𝜏superscriptsubscript𝑦𝜏1subscript1𝜏superscriptsubscript𝜏1𝑡subscript𝑧𝜏subscript𝑦𝜏subscript𝑝𝜏superscriptsubscript𝑧𝜏superscriptsubscript𝑦𝜏1subscript𝑝𝜏𝜉\displaystyle\mathbb{P}\left(\sum_{\tau=1}^{t}\left((z_{\tau}-y_{\tau})\mathbf% {1}_{\tau}+(z_{\tau}^{\prime}-y_{\tau}^{\prime})(1-\mathbf{1}_{\tau})\right)-% \sum_{\tau=1}^{t}\left((z_{\tau}-y_{\tau})p_{\tau}+(z_{\tau}^{\prime}-y_{\tau}% ^{\prime})(1-p_{\tau})\right)\leq\xi\right)blackboard_P ( ∑ start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( ( italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT + ( italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_y start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ( 1 - bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) ) - ∑ start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( ( italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT + ( italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_y start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ( 1 - italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) ) ≤ italic_ξ )

We further notice that

𝔼[((zτyτ)𝟏τ+(zτyτ)(1𝟏τ))((zτyτ)pτ+(zτyτ)(1pτ))|τ1]=0,𝔼delimited-[]subscript𝑧𝜏subscript𝑦𝜏subscript1𝜏superscriptsubscript𝑧𝜏superscriptsubscript𝑦𝜏1subscript1𝜏conditionalsubscript𝑧𝜏subscript𝑦𝜏subscript𝑝𝜏superscriptsubscript𝑧𝜏superscriptsubscript𝑦𝜏1subscript𝑝𝜏subscript𝜏10\mathbb{E}[\left((z_{\tau}-y_{\tau})\mathbf{1}_{\tau}+(z_{\tau}^{\prime}-y_{% \tau}^{\prime})(1-\mathbf{1}_{\tau})\right)-\left((z_{\tau}-y_{\tau})p_{\tau}+% (z_{\tau}^{\prime}-y_{\tau}^{\prime})(1-p_{\tau})\right)|\mathcal{F}_{\tau-1}]% =0,blackboard_E [ ( ( italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT + ( italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_y start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ( 1 - bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) ) - ( ( italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT + ( italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_y start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ( 1 - italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) ) | caligraphic_F start_POSTSUBSCRIPT italic_τ - 1 end_POSTSUBSCRIPT ] = 0 , (39)

and with probability one,

|((zτyτ)𝟏τ+(zτyτ)(1𝟏τ))((zτyτ)pτ+(zτyτ)(1pτ))|=|(zτyτzτ+yτ)(𝟏τpτ)|4B.subscript𝑧𝜏subscript𝑦𝜏subscript1𝜏superscriptsubscript𝑧𝜏superscriptsubscript𝑦𝜏1subscript1𝜏subscript𝑧𝜏subscript𝑦𝜏subscript𝑝𝜏superscriptsubscript𝑧𝜏superscriptsubscript𝑦𝜏1subscript𝑝𝜏subscript𝑧𝜏subscript𝑦𝜏superscriptsubscript𝑧𝜏superscriptsubscript𝑦𝜏subscript1𝜏subscript𝑝𝜏4𝐵\left\lvert{\left((z_{\tau}-y_{\tau})\mathbf{1}_{\tau}+(z_{\tau}^{\prime}-y_{% \tau}^{\prime})(1-\mathbf{1}_{\tau})\right)-\left((z_{\tau}-y_{\tau})p_{\tau}+% (z_{\tau}^{\prime}-y_{\tau}^{\prime})(1-p_{\tau})\right)}\right\rvert=\left% \lvert{(z_{\tau}-y_{\tau}-z_{\tau}^{\prime}+y_{\tau}^{\prime})(\mathbf{1}_{% \tau}-p_{\tau})}\right\rvert\leq 4B.| ( ( italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT + ( italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_y start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ( 1 - bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) ) - ( ( italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT + ( italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_y start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ( 1 - italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) ) | = | ( italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + italic_y start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ( bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) | ≤ 4 italic_B . (40)

We can thus apply the Azuma-Hoeffding inequality (see, e.g., (Lalley, 2013)). By Azuma–Hoeffding inequality,

(logf^((xτ,xτ,𝟏τ)τ=1t)logf((xτ,xτ,𝟏τ)τ=1t)ξ)subscript^𝑓superscriptsubscriptsubscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏𝜏1𝑡subscript𝑓superscriptsubscriptsubscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏𝜏1𝑡𝜉\displaystyle\mathbb{P}\left(\log\mathbb{P}_{\hat{f}}((x_{\tau},x_{\tau}^{% \prime},\mathbf{1}_{\tau})_{\tau=1}^{t})-\log\mathbb{P}_{{f}}((x_{\tau},x_{% \tau}^{\prime},\mathbf{1}_{\tau})_{\tau=1}^{t})\leq\xi\right)blackboard_P ( roman_log blackboard_P start_POSTSUBSCRIPT over^ start_ARG italic_f end_ARG end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - roman_log blackboard_P start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ≤ italic_ξ )
\displaystyle\geq (τ=1t((zτyτ)𝟏τ+(zτyτ)(1𝟏τ))τ=1t((zτyτ)pτ+(zτyτ)(1pτ))ξ)superscriptsubscript𝜏1𝑡subscript𝑧𝜏subscript𝑦𝜏subscript1𝜏superscriptsubscript𝑧𝜏superscriptsubscript𝑦𝜏1subscript1𝜏superscriptsubscript𝜏1𝑡subscript𝑧𝜏subscript𝑦𝜏subscript𝑝𝜏superscriptsubscript𝑧𝜏superscriptsubscript𝑦𝜏1subscript𝑝𝜏𝜉\displaystyle\mathbb{P}\left(\sum_{\tau=1}^{t}\left((z_{\tau}-y_{\tau})\mathbf% {1}_{\tau}+(z_{\tau}^{\prime}-y_{\tau}^{\prime})(1-\mathbf{1}_{\tau})\right)-% \sum_{\tau=1}^{t}\left((z_{\tau}-y_{\tau})p_{\tau}+(z_{\tau}^{\prime}-y_{\tau}% ^{\prime})(1-p_{\tau})\right)\leq\xi\right)blackboard_P ( ∑ start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( ( italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT + ( italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_y start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ( 1 - bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) ) - ∑ start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( ( italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT + ( italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_y start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ( 1 - italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) ) ≤ italic_ξ )
\displaystyle\geq 1exp{ξ232tB2}.1superscript𝜉232𝑡superscript𝐵2\displaystyle 1-\exp\left\{-\frac{\xi^{2}}{32tB^{2}}\right\}.1 - roman_exp { - divide start_ARG italic_ξ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 32 italic_t italic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG } .

Set exp{ξ232tB2}=δtsuperscript𝜉232𝑡superscript𝐵2subscript𝛿𝑡\exp\left\{-\frac{\xi^{2}}{32tB^{2}}\right\}=\delta_{t}roman_exp { - divide start_ARG italic_ξ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 32 italic_t italic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG } = italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. That is, ξ=32tB2log1δt𝜉32𝑡superscript𝐵21subscript𝛿𝑡\xi=\sqrt{32tB^{2}\log\frac{1}{\delta_{t}}}italic_ξ = square-root start_ARG 32 italic_t italic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log divide start_ARG 1 end_ARG start_ARG italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG end_ARG. We then get the desired result. ∎

We then have the following high probability confidence set lemma.

Lemma C.2.

For any fixed f^f^𝑓subscript𝑓\hat{f}\in\mathcal{B}_{f}over^ start_ARG italic_f end_ARG ∈ caligraphic_B start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT that is independent of ((xτ,xτ,𝟏τ)τ=1t)superscriptsubscriptsubscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏𝜏1𝑡((x_{\tau},x_{\tau}^{\prime},\mathbf{1}_{\tau})_{\tau=1}^{t})( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ), we have, with probability at least 1δ1𝛿1-\delta1 - italic_δ,

logf^((xτ,xτ,𝟏τ)τ=1t)logf((xτ,xτ,𝟏τ)τ=1t)32tB2logπ2t26δ,t1.formulae-sequencesubscript^𝑓superscriptsubscriptsubscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏𝜏1𝑡subscript𝑓superscriptsubscriptsubscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏𝜏1𝑡32𝑡superscript𝐵2superscript𝜋2superscript𝑡26𝛿for-all𝑡1\log\mathbb{P}_{\hat{f}}((x_{\tau},x_{\tau}^{\prime},\mathbf{1}_{\tau})_{\tau=% 1}^{t})-\log\mathbb{P}_{{f}}((x_{\tau},x_{\tau}^{\prime},\mathbf{1}_{\tau})_{% \tau=1}^{t})\leq\sqrt{32tB^{2}\log\frac{\pi^{2}t^{2}}{6\delta}},\;\forall t% \geq 1.roman_log blackboard_P start_POSTSUBSCRIPT over^ start_ARG italic_f end_ARG end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - roman_log blackboard_P start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ≤ square-root start_ARG 32 italic_t italic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log divide start_ARG italic_π start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 6 italic_δ end_ARG end_ARG , ∀ italic_t ≥ 1 . (41)
Proof.

We use tsubscript𝑡\mathcal{E}_{t}caligraphic_E start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT to denote the event logf^((xτ,xτ,𝟏τ)τ=1t)logf((xτ,xτ,𝟏τ)τ=1t)32tB2log1δtsubscript^𝑓superscriptsubscriptsubscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏𝜏1𝑡subscript𝑓superscriptsubscriptsubscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏𝜏1𝑡32𝑡superscript𝐵21subscript𝛿𝑡\log\mathbb{P}_{\hat{f}}((x_{\tau},x_{\tau}^{\prime},\mathbf{1}_{\tau})_{\tau=% 1}^{t})-\log\mathbb{P}_{{f}}((x_{\tau},x_{\tau}^{\prime},\mathbf{1}_{\tau})_{% \tau=1}^{t})\leq\sqrt{32tB^{2}\log\frac{1}{\delta_{t}}}roman_log blackboard_P start_POSTSUBSCRIPT over^ start_ARG italic_f end_ARG end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - roman_log blackboard_P start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ≤ square-root start_ARG 32 italic_t italic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log divide start_ARG 1 end_ARG start_ARG italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG end_ARG. We pick δt=(6δ)/(π2t2)subscript𝛿𝑡6𝛿superscript𝜋2superscript𝑡2\delta_{t}=\nicefrac{{(6\delta)}}{{(\pi^{2}t^{2})}}italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = / start_ARG ( 6 italic_δ ) end_ARG start_ARG ( italic_π start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_ARG and have,

(logf^((xτ,xτ,𝟏τ)τ=1t)logf((xτ,xτ,𝟏τ)τ=1t)32tB2log1δt,t1)formulae-sequencesubscript^𝑓superscriptsubscriptsubscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏𝜏1𝑡subscript𝑓superscriptsubscriptsubscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏𝜏1𝑡32𝑡superscript𝐵21subscript𝛿𝑡for-all𝑡1\displaystyle\mathbb{P}\left(\log\mathbb{P}_{\hat{f}}((x_{\tau},x_{\tau}^{% \prime},\mathbf{1}_{\tau})_{\tau=1}^{t})-\log\mathbb{P}_{{f}}((x_{\tau},x_{% \tau}^{\prime},\mathbf{1}_{\tau})_{\tau=1}^{t})\leq\sqrt{32tB^{2}\log\frac{1}{% \delta_{t}}},\forall t\geq 1\right)blackboard_P ( roman_log blackboard_P start_POSTSUBSCRIPT over^ start_ARG italic_f end_ARG end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - roman_log blackboard_P start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ≤ square-root start_ARG 32 italic_t italic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log divide start_ARG 1 end_ARG start_ARG italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG end_ARG , ∀ italic_t ≥ 1 )
=\displaystyle== 1(t=1t¯)1¯superscriptsubscript𝑡1subscript𝑡\displaystyle 1-\mathbb{P}\left(\overline{\cap_{t=1}^{\infty}\mathcal{E}_{t}}\right)1 - blackboard_P ( over¯ start_ARG ∩ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT caligraphic_E start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG )
=\displaystyle== 1(t=1t¯)1superscriptsubscript𝑡1¯subscript𝑡\displaystyle 1-\mathbb{P}\left({\cup_{t=1}^{\infty}\overline{\mathcal{E}_{t}}% }\right)1 - blackboard_P ( ∪ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT over¯ start_ARG caligraphic_E start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG )
\displaystyle\geq 1t=1(t¯)1superscriptsubscript𝑡1¯subscript𝑡\displaystyle 1-\sum_{t=1}^{\infty}\mathbb{P}\left(\overline{\mathcal{E}_{t}}\right)1 - ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT blackboard_P ( over¯ start_ARG caligraphic_E start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG )
=\displaystyle== 1t=1(1(t))1superscriptsubscript𝑡11subscript𝑡\displaystyle 1-\sum_{t=1}^{\infty}\left(1-\mathbb{P}\left({\mathcal{E}_{t}}% \right)\right)1 - ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( 1 - blackboard_P ( caligraphic_E start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) )
=\displaystyle== 1t=1(1(logf^((xτ,xτ,𝟏τ)τ=1t)logf((xτ,xτ,𝟏τ)τ=1t)32tB2log1δt))1superscriptsubscript𝑡11subscript^𝑓superscriptsubscriptsubscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏𝜏1𝑡subscript𝑓superscriptsubscriptsubscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏𝜏1𝑡32𝑡superscript𝐵21subscript𝛿𝑡\displaystyle 1-\sum_{t=1}^{\infty}\left(1-\mathbb{P}\left({\log\mathbb{P}_{% \hat{f}}((x_{\tau},x_{\tau}^{\prime},\mathbf{1}_{\tau})_{\tau=1}^{t})-\log% \mathbb{P}_{{f}}((x_{\tau},x_{\tau}^{\prime},\mathbf{1}_{\tau})_{\tau=1}^{t})% \leq\sqrt{32tB^{2}\log\frac{1}{\delta_{t}}}}\right)\right)1 - ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( 1 - blackboard_P ( roman_log blackboard_P start_POSTSUBSCRIPT over^ start_ARG italic_f end_ARG end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - roman_log blackboard_P start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ≤ square-root start_ARG 32 italic_t italic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log divide start_ARG 1 end_ARG start_ARG italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG end_ARG ) )
\displaystyle\geq 1t=1δt1superscriptsubscript𝑡1subscript𝛿𝑡\displaystyle 1-\sum_{t=1}^{\infty}\delta_{t}1 - ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT
=\displaystyle== 16δπ2t=11t216𝛿superscript𝜋2superscriptsubscript𝑡11superscript𝑡2\displaystyle 1-\frac{6\delta}{\pi^{2}}\sum_{t=1}^{\infty}\frac{1}{t^{2}}1 - divide start_ARG 6 italic_δ end_ARG start_ARG italic_π start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG
=\displaystyle== 1δ.1𝛿\displaystyle 1-\delta.1 - italic_δ .

We then have a lemma to bound the difference of log likelihood when two functions are close in infinity-norm sense.

Lemma C.3.

There exists an independent constant CL>0subscript𝐶𝐿0C_{L}>0italic_C start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT > 0, such that, ϵ>0for-allitalic-ϵ0\forall\epsilon>0∀ italic_ϵ > 0, f1,f2ffor-allsubscript𝑓1subscript𝑓2subscript𝑓\forall f_{1},f_{2}\in\mathcal{B}_{f}∀ italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ caligraphic_B start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT that satisfies f1f2ϵsubscriptnormsubscript𝑓1subscript𝑓2italic-ϵ\|f_{1}-f_{2}\|_{\infty}\leq\epsilon∥ italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ≤ italic_ϵ, we have,

logf1((xτ,xτ,𝟏τ)τ=1t)logf2((xτ,xτ,𝟏τ)τ=1t)CLϵt.subscriptsubscript𝑓1superscriptsubscriptsubscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏𝜏1𝑡subscriptsubscript𝑓2superscriptsubscriptsubscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏𝜏1𝑡subscript𝐶𝐿italic-ϵ𝑡\log\mathbb{P}_{{f}_{1}}((x_{\tau},x_{\tau}^{\prime},\mathbf{1}_{\tau})_{\tau=% 1}^{t})-\log\mathbb{P}_{{f}_{2}}((x_{\tau},x_{\tau}^{\prime},\mathbf{1}_{\tau}% )_{\tau=1}^{t})\leq C_{L}\epsilon t.roman_log blackboard_P start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - roman_log blackboard_P start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ≤ italic_C start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT italic_ϵ italic_t . (42)
Proof.

We use zi,τsubscript𝑧𝑖𝜏z_{i,\tau}italic_z start_POSTSUBSCRIPT italic_i , italic_τ end_POSTSUBSCRIPT (zi,τsuperscriptsubscript𝑧𝑖𝜏z_{i,\tau}^{\prime}italic_z start_POSTSUBSCRIPT italic_i , italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, resp.) to denote fi(xτ)subscript𝑓𝑖subscript𝑥𝜏f_{i}(x_{\tau})italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) (fi(xτ)subscript𝑓𝑖superscriptsubscript𝑥𝜏f_{i}(x_{\tau}^{\prime})italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ), resp.), i{0,1}for-all𝑖01\forall i\in\{0,1\}∀ italic_i ∈ { 0 , 1 }.

logf1((xτ,xτ,𝟏τ)τ=1t)logf2((xτ,xτ,𝟏τ)τ=1t)subscriptsubscript𝑓1superscriptsubscriptsubscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏𝜏1𝑡subscriptsubscript𝑓2superscriptsubscriptsubscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏𝜏1𝑡\displaystyle\log\mathbb{P}_{{f}_{1}}((x_{\tau},x_{\tau}^{\prime},\mathbf{1}_{% \tau})_{\tau=1}^{t})-\log\mathbb{P}_{{f}_{2}}((x_{\tau},x_{\tau}^{\prime},% \mathbf{1}_{\tau})_{\tau=1}^{t})roman_log blackboard_P start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - roman_log blackboard_P start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT )
=\displaystyle== τ=1t((z1,τz2,τ)𝟏τ+(z1,τz2,τ)(1𝟏τ))τ=1tlog(ez1,τ+ez1,τ)+τ=1tlog(ez2,τ+ez2,τ)superscriptsubscript𝜏1𝑡subscript𝑧1𝜏subscript𝑧2𝜏subscript1𝜏superscriptsubscript𝑧1𝜏superscriptsubscript𝑧2𝜏1subscript1𝜏superscriptsubscript𝜏1𝑡superscript𝑒subscript𝑧1𝜏superscript𝑒superscriptsubscript𝑧1𝜏superscriptsubscript𝜏1𝑡superscript𝑒subscript𝑧2𝜏superscript𝑒superscriptsubscript𝑧2𝜏\displaystyle\sum_{\tau=1}^{t}\left((z_{1,\tau}-z_{2,\tau})\mathbf{1}_{\tau}+(% z_{1,\tau}^{\prime}-z_{2,\tau}^{\prime})(1-\mathbf{1}_{\tau})\right)-\sum_{% \tau=1}^{t}\log\left(e^{z_{1,\tau}}+e^{z_{1,\tau}^{\prime}}\right)+\sum_{\tau=% 1}^{t}\log\left(e^{z_{2,\tau}}+e^{z_{2,\tau}^{\prime}}\right)∑ start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( ( italic_z start_POSTSUBSCRIPT 1 , italic_τ end_POSTSUBSCRIPT - italic_z start_POSTSUBSCRIPT 2 , italic_τ end_POSTSUBSCRIPT ) bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT + ( italic_z start_POSTSUBSCRIPT 1 , italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_z start_POSTSUBSCRIPT 2 , italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ( 1 - bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) ) - ∑ start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT roman_log ( italic_e start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT 1 , italic_τ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT + italic_e start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT 1 , italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ) + ∑ start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT roman_log ( italic_e start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT 2 , italic_τ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT + italic_e start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT 2 , italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ) (43)
\displaystyle\leq ϵt+τ=1tmaxz,z[B,B]z,zlog(ez+ez)(z1,τ,z1,τ)(z2,τ,z2,τ)italic-ϵ𝑡superscriptsubscript𝜏1𝑡subscript𝑧superscript𝑧𝐵𝐵normsubscript𝑧superscript𝑧superscript𝑒𝑧superscript𝑒superscript𝑧normsubscript𝑧1𝜏superscriptsubscript𝑧1𝜏subscript𝑧2𝜏superscriptsubscript𝑧2𝜏\displaystyle\epsilon t+\sum_{\tau=1}^{t}\max_{z,z^{\prime}\in[-B,B]}\left\|% \nabla_{z,z^{\prime}}\log\left(e^{z}+e^{z^{\prime}}\right)\right\|\left\|(z_{1% ,\tau},z_{1,\tau}^{\prime})-(z_{2,\tau},z_{2,\tau}^{\prime})\right\|italic_ϵ italic_t + ∑ start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT roman_max start_POSTSUBSCRIPT italic_z , italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ [ - italic_B , italic_B ] end_POSTSUBSCRIPT ∥ ∇ start_POSTSUBSCRIPT italic_z , italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_log ( italic_e start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT + italic_e start_POSTSUPERSCRIPT italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ) ∥ ∥ ( italic_z start_POSTSUBSCRIPT 1 , italic_τ end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT 1 , italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) - ( italic_z start_POSTSUBSCRIPT 2 , italic_τ end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT 2 , italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ∥ (44)
\displaystyle\leq ϵt+τ=1t21+e2B2ϵitalic-ϵ𝑡superscriptsubscript𝜏1𝑡21superscript𝑒2𝐵2italic-ϵ\displaystyle\epsilon t+\sum_{\tau=1}^{t}\frac{\sqrt{2}}{1+e^{-2B}}\sqrt{2}\epsilonitalic_ϵ italic_t + ∑ start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT divide start_ARG square-root start_ARG 2 end_ARG end_ARG start_ARG 1 + italic_e start_POSTSUPERSCRIPT - 2 italic_B end_POSTSUPERSCRIPT end_ARG square-root start_ARG 2 end_ARG italic_ϵ (45)
=\displaystyle== (1+21+e2B)ϵt,121superscript𝑒2𝐵italic-ϵ𝑡\displaystyle\left(1+\frac{{2}}{1+e^{-2B}}\right)\epsilon t,( 1 + divide start_ARG 2 end_ARG start_ARG 1 + italic_e start_POSTSUPERSCRIPT - 2 italic_B end_POSTSUPERSCRIPT end_ARG ) italic_ϵ italic_t , (46)

where the equality (43) follows by the definition of log-likelihood function, and the inequality (44) follows by the assumption and the mean-value theorem. The conclusion follows by setting CL=1+21+e2Bsubscript𝐶𝐿121superscript𝑒2𝐵C_{L}=1+\frac{{2}}{1+e^{-2B}}italic_C start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT = 1 + divide start_ARG 2 end_ARG start_ARG 1 + italic_e start_POSTSUPERSCRIPT - 2 italic_B end_POSTSUPERSCRIPT end_ARG.

Main proof: We use 𝒩(f,ϵ,)\mathcal{N}(\mathcal{B}_{f},\epsilon,\|\cdot\|_{\infty})caligraphic_N ( caligraphic_B start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT , italic_ϵ , ∥ ⋅ ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ) to denote the covering number of the set fsubscript𝑓\mathcal{B}_{f}caligraphic_B start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT, with (fiϵ)i=1𝒩(f,ϵ,)(f^{\epsilon}_{i})_{i=1}^{\mathcal{N}(\mathcal{B}_{f},\epsilon,\|\cdot\|_{% \infty})}( italic_f start_POSTSUPERSCRIPT italic_ϵ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT caligraphic_N ( caligraphic_B start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT , italic_ϵ , ∥ ⋅ ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT be a set of ϵitalic-ϵ\epsilonitalic_ϵ-covering for the set fsubscript𝑓\mathcal{B}_{f}caligraphic_B start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT. Reset the ‘δ𝛿\deltaitalic_δ’ in Lem. C.2 as δ/𝒩(f,ϵ,)\nicefrac{{\delta}}{{\mathcal{N}(\mathcal{B}_{f},\epsilon,\|\cdot\|_{\infty})}}/ start_ARG italic_δ end_ARG start_ARG caligraphic_N ( caligraphic_B start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT , italic_ϵ , ∥ ⋅ ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ) end_ARG and applying the probability union bound, we have, with probability at least 1δ1𝛿1-\delta1 - italic_δ, fiϵ,t1for-allsuperscriptsubscript𝑓𝑖italic-ϵ𝑡1\forall f_{i}^{\epsilon},t\geq 1∀ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ϵ end_POSTSUPERSCRIPT , italic_t ≥ 1,

logfiϵ((xτ,xτ,𝟏τ)τ=1t)logf((xτ,xτ,𝟏τ)τ=1t)32tB2logπ2t2𝒩(f,ϵ,)6δ.\log\mathbb{P}_{{f}_{i}^{\epsilon}}((x_{\tau},x_{\tau}^{\prime},\mathbf{1}_{% \tau})_{\tau=1}^{t})-\log\mathbb{P}_{{f}}((x_{\tau},x_{\tau}^{\prime},\mathbf{% 1}_{\tau})_{\tau=1}^{t})\leq\sqrt{32tB^{2}\log\frac{\pi^{2}t^{2}\mathcal{N}(% \mathcal{B}_{f},\epsilon,\|\cdot\|_{\infty})}{6\delta}}.roman_log blackboard_P start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ϵ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - roman_log blackboard_P start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ≤ square-root start_ARG 32 italic_t italic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log divide start_ARG italic_π start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT caligraphic_N ( caligraphic_B start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT , italic_ϵ , ∥ ⋅ ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ) end_ARG start_ARG 6 italic_δ end_ARG end_ARG . (47)

By the definition of ϵitalic-ϵ\epsilonitalic_ϵ-covering, there exists j[𝒩(f,ϵ,)]j\in[\mathcal{N}(\mathcal{B}_{f},\epsilon,\|\cdot\|_{\infty})]italic_j ∈ [ caligraphic_N ( caligraphic_B start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT , italic_ϵ , ∥ ⋅ ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ) ], such that,

f^tMLEfjϵϵ.subscriptnormsuperscriptsubscript^𝑓𝑡MLEsuperscriptsubscript𝑓𝑗italic-ϵitalic-ϵ\|\hat{f}_{t}^{\mathrm{MLE}}-f_{j}^{\epsilon}\|_{\infty}\leq\epsilon.∥ over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_MLE end_POSTSUPERSCRIPT - italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ϵ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ≤ italic_ϵ . (48)

Hence, with probability at least 1δ1𝛿1-\delta1 - italic_δ,

logf^tMLE((xτ,xτ,𝟏τ)τ=1t)logf((xτ,xτ,𝟏τ)τ=1t)subscriptsuperscriptsubscript^𝑓𝑡MLEsuperscriptsubscriptsubscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏𝜏1𝑡subscript𝑓superscriptsubscriptsubscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏𝜏1𝑡\displaystyle\log\mathbb{P}_{\hat{f}_{t}^{\mathrm{MLE}}}((x_{\tau},x_{\tau}^{% \prime},\mathbf{1}_{\tau})_{\tau=1}^{t})-\log\mathbb{P}_{{f}}((x_{\tau},x_{% \tau}^{\prime},\mathbf{1}_{\tau})_{\tau=1}^{t})roman_log blackboard_P start_POSTSUBSCRIPT over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_MLE end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - roman_log blackboard_P start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT )
=\displaystyle== logf^tMLE((xτ,xτ,𝟏τ)τ=1t)logfjϵ((xτ,xτ,𝟏τ)τ=1t)+logfjϵ((xτ,xτ,𝟏τ)τ=1t)logf((xτ,xτ,𝟏τ)τ=1t)subscriptsuperscriptsubscript^𝑓𝑡MLEsuperscriptsubscriptsubscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏𝜏1𝑡subscriptsuperscriptsubscript𝑓𝑗italic-ϵsuperscriptsubscriptsubscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏𝜏1𝑡subscriptsuperscriptsubscript𝑓𝑗italic-ϵsuperscriptsubscriptsubscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏𝜏1𝑡subscript𝑓superscriptsubscriptsubscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏𝜏1𝑡\displaystyle\log\mathbb{P}_{\hat{f}_{t}^{\mathrm{MLE}}}((x_{\tau},x_{\tau}^{% \prime},\mathbf{1}_{\tau})_{\tau=1}^{t})-\log\mathbb{P}_{{f}_{j}^{\epsilon}}((% x_{\tau},x_{\tau}^{\prime},\mathbf{1}_{\tau})_{\tau=1}^{t})+\log\mathbb{P}_{{f% }_{j}^{\epsilon}}((x_{\tau},x_{\tau}^{\prime},\mathbf{1}_{\tau})_{\tau=1}^{t})% -\log\mathbb{P}_{{f}}((x_{\tau},x_{\tau}^{\prime},\mathbf{1}_{\tau})_{\tau=1}^% {t})roman_log blackboard_P start_POSTSUBSCRIPT over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_MLE end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - roman_log blackboard_P start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ϵ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) + roman_log blackboard_P start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ϵ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - roman_log blackboard_P start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT )
\displaystyle\leq CLϵt+32tB2logπ2t2𝒩(f,ϵ,)6δ,\displaystyle C_{L}\epsilon t+\sqrt{32tB^{2}\log\frac{\pi^{2}t^{2}\mathcal{N}(% \mathcal{B}_{f},\epsilon,\|\cdot\|_{\infty})}{6\delta}},italic_C start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT italic_ϵ italic_t + square-root start_ARG 32 italic_t italic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log divide start_ARG italic_π start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT caligraphic_N ( caligraphic_B start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT , italic_ϵ , ∥ ⋅ ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ) end_ARG start_ARG 6 italic_δ end_ARG end_ARG ,

where the inequality follows by Lem. C.3 and the inequality (47).

Appendix D Proof of Lem. 3.5

We first have a lemma.

Lemma D.1.

We have,

logp^logp1p(p^p)Hσ(p^p)2,p,p^[σ¯,σ¯],formulae-sequence^𝑝𝑝1𝑝^𝑝𝑝subscript𝐻𝜎superscript^𝑝𝑝2for-all𝑝^𝑝¯𝜎¯𝜎\log\hat{p}-\log p\leq\frac{1}{p}(\hat{p}-p)-H_{\sigma}(\hat{p}-p)^{2},\forall p% ,\hat{p}\in[\underline{\sigma},\bar{\sigma}],roman_log over^ start_ARG italic_p end_ARG - roman_log italic_p ≤ divide start_ARG 1 end_ARG start_ARG italic_p end_ARG ( over^ start_ARG italic_p end_ARG - italic_p ) - italic_H start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ( over^ start_ARG italic_p end_ARG - italic_p ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , ∀ italic_p , over^ start_ARG italic_p end_ARG ∈ [ under¯ start_ARG italic_σ end_ARG , over¯ start_ARG italic_σ end_ARG ] , (49)

where Hσ=12σ¯2subscript𝐻𝜎12superscript¯𝜎2H_{\sigma}=\frac{1}{2\bar{\sigma}^{2}}italic_H start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG 2 over¯ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG.

Proof.

Let ζ(p^)=logp^logp1p(p^p)+Hσ(p^p)2,p,p^[σ¯,σ¯].formulae-sequence𝜁^𝑝^𝑝𝑝1𝑝^𝑝𝑝subscript𝐻𝜎superscript^𝑝𝑝2for-all𝑝^𝑝¯𝜎¯𝜎\zeta(\hat{p})=\log\hat{p}-\log p-\frac{1}{p}(\hat{p}-p)+H_{\sigma}(\hat{p}-p)% ^{2},\forall p,\hat{p}\in[\underline{\sigma},\bar{\sigma}].italic_ζ ( over^ start_ARG italic_p end_ARG ) = roman_log over^ start_ARG italic_p end_ARG - roman_log italic_p - divide start_ARG 1 end_ARG start_ARG italic_p end_ARG ( over^ start_ARG italic_p end_ARG - italic_p ) + italic_H start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ( over^ start_ARG italic_p end_ARG - italic_p ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , ∀ italic_p , over^ start_ARG italic_p end_ARG ∈ [ under¯ start_ARG italic_σ end_ARG , over¯ start_ARG italic_σ end_ARG ] . We have,

ζ(p^)=1p^1p+2Hσ(p^p)=(p^p)(1σ¯21p^p),p^[σ¯,σ¯].formulae-sequencesuperscript𝜁^𝑝1^𝑝1𝑝2subscript𝐻𝜎^𝑝𝑝^𝑝𝑝1superscript¯𝜎21^𝑝𝑝for-all^𝑝¯𝜎¯𝜎\zeta^{\prime}(\hat{p})=\frac{1}{\hat{p}}-\frac{1}{p}+2H_{\sigma}(\hat{p}-p)=(% \hat{p}-p)\left(\frac{1}{\bar{\sigma}^{2}}-\frac{1}{\hat{p}p}\right),\forall% \hat{p}\in[\underline{\sigma},\bar{\sigma}].italic_ζ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( over^ start_ARG italic_p end_ARG ) = divide start_ARG 1 end_ARG start_ARG over^ start_ARG italic_p end_ARG end_ARG - divide start_ARG 1 end_ARG start_ARG italic_p end_ARG + 2 italic_H start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ( over^ start_ARG italic_p end_ARG - italic_p ) = ( over^ start_ARG italic_p end_ARG - italic_p ) ( divide start_ARG 1 end_ARG start_ARG over¯ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG - divide start_ARG 1 end_ARG start_ARG over^ start_ARG italic_p end_ARG italic_p end_ARG ) , ∀ over^ start_ARG italic_p end_ARG ∈ [ under¯ start_ARG italic_σ end_ARG , over¯ start_ARG italic_σ end_ARG ] .

Since p,p^[σ¯,σ¯]for-all𝑝^𝑝¯𝜎¯𝜎\forall p,\hat{p}\in[\underline{\sigma},\bar{\sigma}]∀ italic_p , over^ start_ARG italic_p end_ARG ∈ [ under¯ start_ARG italic_σ end_ARG , over¯ start_ARG italic_σ end_ARG ], we have 1σ¯21p^p01superscript¯𝜎21^𝑝𝑝0\frac{1}{\bar{\sigma}^{2}}-\frac{1}{\hat{p}p}\leq 0divide start_ARG 1 end_ARG start_ARG over¯ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG - divide start_ARG 1 end_ARG start_ARG over^ start_ARG italic_p end_ARG italic_p end_ARG ≤ 0. Hence, ζ(p^)0,p^[σ¯,p]formulae-sequencesuperscript𝜁^𝑝0for-all^𝑝¯𝜎𝑝\zeta^{\prime}(\hat{p})\geq 0,\forall\hat{p}\in[\underline{\sigma},p]italic_ζ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( over^ start_ARG italic_p end_ARG ) ≥ 0 , ∀ over^ start_ARG italic_p end_ARG ∈ [ under¯ start_ARG italic_σ end_ARG , italic_p ] and ζ(p^)0,p^[p,σ¯]formulae-sequencesuperscript𝜁^𝑝0for-all^𝑝𝑝¯𝜎\zeta^{\prime}(\hat{p})\leq 0,\forall\hat{p}\in[p,\bar{\sigma}]italic_ζ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( over^ start_ARG italic_p end_ARG ) ≤ 0 , ∀ over^ start_ARG italic_p end_ARG ∈ [ italic_p , over¯ start_ARG italic_σ end_ARG ]. Therefore, ζ(p^)𝜁^𝑝\zeta(\hat{p})italic_ζ ( over^ start_ARG italic_p end_ARG ) achieves the maximum over [σ¯,σ¯]¯𝜎¯𝜎[\underline{\sigma},\bar{\sigma}][ under¯ start_ARG italic_σ end_ARG , over¯ start_ARG italic_σ end_ARG ] at the point p𝑝pitalic_p. So ζ(p^)ζ(p)=0𝜁^𝑝𝜁𝑝0\zeta(\hat{p})\leq\zeta(p)=0italic_ζ ( over^ start_ARG italic_p end_ARG ) ≤ italic_ζ ( italic_p ) = 0. Rearrangement then gives the desired result. ∎

For any fixed function f^f^𝑓subscript𝑓\hat{f}\in\mathcal{B}_{f}over^ start_ARG italic_f end_ARG ∈ caligraphic_B start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT, we use the notations p^τ=σ(f^(xτ)f^(xτ))[σ¯,σ¯]subscript^𝑝𝜏𝜎^𝑓subscript𝑥𝜏^𝑓superscriptsubscript𝑥𝜏¯𝜎¯𝜎\hat{p}_{\tau}=\sigma(\hat{f}(x_{\tau})-\hat{f}(x_{\tau}^{\prime}))\in[% \underline{\sigma},\bar{\sigma}]over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT = italic_σ ( over^ start_ARG italic_f end_ARG ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) - over^ start_ARG italic_f end_ARG ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) ∈ [ under¯ start_ARG italic_σ end_ARG , over¯ start_ARG italic_σ end_ARG ] and pτ=σ(f(xτ)f(xτ))[σ¯,σ¯]subscript𝑝𝜏𝜎𝑓subscript𝑥𝜏𝑓superscriptsubscript𝑥𝜏¯𝜎¯𝜎{p}_{\tau}=\sigma({f}(x_{\tau})-{f}(x_{\tau}^{\prime}))\in[\underline{\sigma},% \bar{\sigma}]italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT = italic_σ ( italic_f ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) - italic_f ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) ∈ [ under¯ start_ARG italic_σ end_ARG , over¯ start_ARG italic_σ end_ARG ]. We have,

logf^((xτ,xτ,𝟏τ)τ=1t)logf((xτ,xτ,𝟏τ)τ=1t)subscript^𝑓superscriptsubscriptsubscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏𝜏1𝑡subscript𝑓superscriptsubscriptsubscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏𝜏1𝑡\displaystyle\log{\mathbb{P}_{\hat{f}}}((x_{\tau},x_{\tau}^{\prime},\mathbf{1}% _{\tau})_{\tau=1}^{t})-\log{\mathbb{P}_{{f}}}((x_{\tau},x_{\tau}^{\prime},% \mathbf{1}_{\tau})_{\tau=1}^{t})roman_log blackboard_P start_POSTSUBSCRIPT over^ start_ARG italic_f end_ARG end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - roman_log blackboard_P start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT )
=\displaystyle== τ=1t(logpf^(xτ,xτ,𝟏τ)logpf(xτ,xτ,𝟏τ))superscriptsubscript𝜏1𝑡subscript𝑝^𝑓subscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏subscript𝑝𝑓subscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏\displaystyle\sum_{\tau=1}^{t}\left(\log{p_{\hat{f}}}(x_{\tau},x_{\tau}^{% \prime},\mathbf{1}_{\tau})-\log{p_{{f}}}(x_{\tau},x_{\tau}^{\prime},\mathbf{1}% _{\tau})\right)∑ start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( roman_log italic_p start_POSTSUBSCRIPT over^ start_ARG italic_f end_ARG end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) - roman_log italic_p start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) )
=\displaystyle== τ=1t(𝟏τ(logp^τlogpτ)+(1𝟏τ)(log(1p^τ)log(1pτ))).superscriptsubscript𝜏1𝑡subscript1𝜏subscript^𝑝𝜏subscript𝑝𝜏1subscript1𝜏1subscript^𝑝𝜏1subscript𝑝𝜏\displaystyle\sum_{\tau=1}^{t}\left(\mathbf{1}_{\tau}\left(\log{\hat{p}_{\tau}% }-\log{p}_{\tau}\right)+(1-\mathbf{1}_{\tau})\left(\log{(1-\hat{p}_{\tau})}-% \log(1-{p}_{\tau})\right)\right).∑ start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( roman_log over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - roman_log italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) + ( 1 - bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) ( roman_log ( 1 - over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) - roman_log ( 1 - italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) ) ) .

Hence,

logf^((xτ,xτ,𝟏τ)τ=1t)logf((xτ,xτ,𝟏τ)τ=1t)subscript^𝑓superscriptsubscriptsubscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏𝜏1𝑡subscript𝑓superscriptsubscriptsubscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏𝜏1𝑡\displaystyle\log{\mathbb{P}_{\hat{f}}}((x_{\tau},x_{\tau}^{\prime},\mathbf{1}% _{\tau})_{\tau=1}^{t})-\log{\mathbb{P}_{{f}}}((x_{\tau},x_{\tau}^{\prime},% \mathbf{1}_{\tau})_{\tau=1}^{t})roman_log blackboard_P start_POSTSUBSCRIPT over^ start_ARG italic_f end_ARG end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - roman_log blackboard_P start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT )
=\displaystyle== τ=1t(𝟏τ(logp^τlogpτ)+(1𝟏τ)(log(1p^τ)log(1pτ)))superscriptsubscript𝜏1𝑡subscript1𝜏subscript^𝑝𝜏subscript𝑝𝜏1subscript1𝜏1subscript^𝑝𝜏1subscript𝑝𝜏\displaystyle\sum_{\tau=1}^{t}\left(\mathbf{1}_{\tau}\left(\log{\hat{p}_{\tau}% }-\log{p}_{\tau}\right)+(1-\mathbf{1}_{\tau})\left(\log{(1-\hat{p}_{\tau})}-% \log(1-{p}_{\tau})\right)\right)∑ start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( roman_log over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - roman_log italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) + ( 1 - bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) ( roman_log ( 1 - over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) - roman_log ( 1 - italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) ) )
\displaystyle\leq τ=1t(𝟏τ(p^τpτpτHσ(p^τpτ)2)+(1𝟏τ)(pτp^τ1pτHσ(p^τpτ)2))superscriptsubscript𝜏1𝑡subscript1𝜏subscript^𝑝𝜏subscript𝑝𝜏subscript𝑝𝜏subscript𝐻𝜎superscriptsubscript^𝑝𝜏subscript𝑝𝜏21subscript1𝜏subscript𝑝𝜏subscript^𝑝𝜏1subscript𝑝𝜏subscript𝐻𝜎superscriptsubscript^𝑝𝜏subscript𝑝𝜏2\displaystyle\sum_{\tau=1}^{t}\left(\mathbf{1}_{\tau}\left(\frac{\hat{p}_{\tau% }-p_{\tau}}{p_{\tau}}-H_{\sigma}\left(\hat{p}_{\tau}-p_{\tau}\right)^{2}\right% )+(1-\mathbf{1}_{\tau})\left(\frac{{p}_{\tau}-\hat{p}_{\tau}}{1-p_{\tau}}-H_{% \sigma}\left(\hat{p}_{\tau}-p_{\tau}\right)^{2}\right)\right)∑ start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( divide start_ARG over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_ARG start_ARG italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_ARG - italic_H start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ( over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + ( 1 - bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) ( divide start_ARG italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_ARG - italic_H start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ( over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) )

Rearrangement gives,

Hστ=1t(p^τpτ)2+logf^((xτ,xτ,𝟏τ)τ=1t)logf((xτ,xτ,𝟏τ)τ=1t)τ=1t(𝟏τp^τpτpτ+(1𝟏τ)pτp^τ1pτ).subscript𝐻𝜎superscriptsubscript𝜏1𝑡superscriptsubscript^𝑝𝜏subscript𝑝𝜏2subscript^𝑓superscriptsubscriptsubscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏𝜏1𝑡subscript𝑓superscriptsubscriptsubscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏𝜏1𝑡superscriptsubscript𝜏1𝑡subscript1𝜏subscript^𝑝𝜏subscript𝑝𝜏subscript𝑝𝜏1subscript1𝜏subscript𝑝𝜏subscript^𝑝𝜏1subscript𝑝𝜏H_{\sigma}\sum_{\tau=1}^{t}\left(\hat{p}_{\tau}-p_{\tau}\right)^{2}+\log{% \mathbb{P}_{\hat{f}}}((x_{\tau},x_{\tau}^{\prime},\mathbf{1}_{\tau})_{\tau=1}^% {t})-\log{\mathbb{P}_{{f}}}((x_{\tau},x_{\tau}^{\prime},\mathbf{1}_{\tau})_{% \tau=1}^{t})\leq\sum_{\tau=1}^{t}\left(\mathbf{1}_{\tau}\frac{\hat{p}_{\tau}-p% _{\tau}}{p_{\tau}}+(1-\mathbf{1}_{\tau})\frac{{p}_{\tau}-\hat{p}_{\tau}}{1-p_{% \tau}}\right).italic_H start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + roman_log blackboard_P start_POSTSUBSCRIPT over^ start_ARG italic_f end_ARG end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - roman_log blackboard_P start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ≤ ∑ start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT divide start_ARG over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_ARG start_ARG italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_ARG + ( 1 - bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) divide start_ARG italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_ARG ) . (50)

We then have the following lemma,

Lemma D.2.

For any fixed f^f^𝑓subscript𝑓\hat{f}\in\mathcal{B}_{f}over^ start_ARG italic_f end_ARG ∈ caligraphic_B start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT and t1for-all𝑡1\forall t\geq 1∀ italic_t ≥ 1, we have, with probability at least 1δt1subscript𝛿𝑡1-\delta_{t}1 - italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT,

(Hστ=1t(p^τpτ)2logf((xτ,xτ,𝟏τ)τ=1t)logf^((xτ,xτ,𝟏τ)τ=1t)+2tBp2log1δt)1δt.subscript𝐻𝜎superscriptsubscript𝜏1𝑡superscriptsubscript^𝑝𝜏subscript𝑝𝜏2subscript𝑓superscriptsubscriptsubscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏𝜏1𝑡subscript^𝑓superscriptsubscriptsubscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏𝜏1𝑡2𝑡superscriptsubscript𝐵𝑝21subscript𝛿𝑡1subscript𝛿𝑡\mathbb{P}\left(H_{\sigma}\sum_{\tau=1}^{t}\left(\hat{p}_{\tau}-p_{\tau}\right% )^{2}\leq\log{\mathbb{P}_{{f}}}((x_{\tau},x_{\tau}^{\prime},\mathbf{1}_{\tau})% _{\tau=1}^{t})-\log{\mathbb{P}_{\hat{f}}}((x_{\tau},x_{\tau}^{\prime},\mathbf{% 1}_{\tau})_{\tau=1}^{t})+\sqrt{2{tB_{p}^{2}\log\frac{1}{\delta_{t}}}{}}\right)% \geq 1-\delta_{t}.blackboard_P ( italic_H start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ roman_log blackboard_P start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - roman_log blackboard_P start_POSTSUBSCRIPT over^ start_ARG italic_f end_ARG end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) + square-root start_ARG 2 italic_t italic_B start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log divide start_ARG 1 end_ARG start_ARG italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG end_ARG ) ≥ 1 - italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT . (51)
Proof.

Since 𝔼[𝟏τp^τpτpτ+(1𝟏τ)pτp^τ1pτ|τ1]=𝔼[pτp^τpτpτ+(1pτ)pτp^τ1pτ|τ1]=0𝔼delimited-[]subscript1𝜏subscript^𝑝𝜏subscript𝑝𝜏subscript𝑝𝜏conditional1subscript1𝜏subscript𝑝𝜏subscript^𝑝𝜏1subscript𝑝𝜏subscript𝜏1𝔼delimited-[]subscript𝑝𝜏subscript^𝑝𝜏subscript𝑝𝜏subscript𝑝𝜏conditional1subscript𝑝𝜏subscript𝑝𝜏subscript^𝑝𝜏1subscript𝑝𝜏subscript𝜏10\mathbb{E}\left[\mathbf{1}_{\tau}\frac{\hat{p}_{\tau}-p_{\tau}}{p_{\tau}}+(1-% \mathbf{1}_{\tau})\frac{{p}_{\tau}-\hat{p}_{\tau}}{1-p_{\tau}}|\mathcal{F}_{% \tau-1}\right]=\mathbb{E}\left[p_{\tau}\frac{\hat{p}_{\tau}-p_{\tau}}{p_{\tau}% }+(1-p_{\tau})\frac{{p}_{\tau}-\hat{p}_{\tau}}{1-p_{\tau}}|\mathcal{F}_{\tau-1% }\right]=0blackboard_E [ bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT divide start_ARG over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_ARG start_ARG italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_ARG + ( 1 - bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) divide start_ARG italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_ARG | caligraphic_F start_POSTSUBSCRIPT italic_τ - 1 end_POSTSUBSCRIPT ] = blackboard_E [ italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT divide start_ARG over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_ARG start_ARG italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_ARG + ( 1 - italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) divide start_ARG italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_ARG | caligraphic_F start_POSTSUBSCRIPT italic_τ - 1 end_POSTSUBSCRIPT ] = 0 and with probability one,

|𝟏τp^τpτpτ+(1𝟏τ)pτp^τ1pτ|subscript1𝜏subscript^𝑝𝜏subscript𝑝𝜏subscript𝑝𝜏1subscript1𝜏subscript𝑝𝜏subscript^𝑝𝜏1subscript𝑝𝜏\displaystyle\left\lvert{\mathbf{1}_{\tau}\frac{\hat{p}_{\tau}-p_{\tau}}{p_{% \tau}}+(1-\mathbf{1}_{\tau})\frac{{p}_{\tau}-\hat{p}_{\tau}}{1-p_{\tau}}}\right\rvert| bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT divide start_ARG over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_ARG start_ARG italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_ARG + ( 1 - bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) divide start_ARG italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_ARG | 𝟏τ|p^τpτpτ|+(1𝟏τ)|pτp^τ1pτ|absentsubscript1𝜏subscript^𝑝𝜏subscript𝑝𝜏subscript𝑝𝜏1subscript1𝜏subscript𝑝𝜏subscript^𝑝𝜏1subscript𝑝𝜏\displaystyle\leq\mathbf{1}_{\tau}\left\lvert{\frac{\hat{p}_{\tau}-p_{\tau}}{p% _{\tau}}}\right\rvert+(1-\mathbf{1}_{\tau})\left\lvert{\frac{{p}_{\tau}-\hat{p% }_{\tau}}{1-p_{\tau}}}\right\rvert≤ bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT | divide start_ARG over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_ARG start_ARG italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_ARG | + ( 1 - bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) | divide start_ARG italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_ARG | (52)
=𝟏τ|p^τpτ1|+(1𝟏τ)|1p^τ1pτ1|absentsubscript1𝜏subscript^𝑝𝜏subscript𝑝𝜏11subscript1𝜏1subscript^𝑝𝜏1subscript𝑝𝜏1\displaystyle=\mathbf{1}_{\tau}\left\lvert{\frac{\hat{p}_{\tau}}{p_{\tau}}-1}% \right\rvert+(1-\mathbf{1}_{\tau})\left\lvert{\frac{1-\hat{p}_{\tau}}{1-p_{% \tau}}-1}\right\rvert= bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT | divide start_ARG over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_ARG start_ARG italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_ARG - 1 | + ( 1 - bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) | divide start_ARG 1 - over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_ARG - 1 | (53)
σ¯σ¯σ¯σ¯=Bp,absent¯𝜎¯𝜎¯𝜎¯𝜎subscript𝐵𝑝\displaystyle\leq\frac{\bar{\sigma}}{\underline{\sigma}}-\frac{\underline{% \sigma}}{\bar{\sigma}}=B_{p},≤ divide start_ARG over¯ start_ARG italic_σ end_ARG end_ARG start_ARG under¯ start_ARG italic_σ end_ARG end_ARG - divide start_ARG under¯ start_ARG italic_σ end_ARG end_ARG start_ARG over¯ start_ARG italic_σ end_ARG end_ARG = italic_B start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , (54)

where the last inequality follows by that p^τ,pτ,1p^τ,1pτ[σ¯,σ¯]subscript^𝑝𝜏subscript𝑝𝜏1subscript^𝑝𝜏1subscript𝑝𝜏¯𝜎¯𝜎\hat{p}_{\tau},p_{\tau},1-\hat{p}_{\tau},1-p_{\tau}\in[\underline{\sigma},\bar% {\sigma}]over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , 1 - over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , 1 - italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ∈ [ under¯ start_ARG italic_σ end_ARG , over¯ start_ARG italic_σ end_ARG ]. Thus we can apply the Azuma–Hoeffding inequality. By Azuma–Hoeffding inequality, we have,

(τ=1t(𝟏τp^τpτpτ+(1𝟏τ)pτp^τ1pτ)ξ)1exp{ξ22tBp2}.superscriptsubscript𝜏1𝑡subscript1𝜏subscript^𝑝𝜏subscript𝑝𝜏subscript𝑝𝜏1subscript1𝜏subscript𝑝𝜏subscript^𝑝𝜏1subscript𝑝𝜏𝜉1superscript𝜉22𝑡superscriptsubscript𝐵𝑝2\mathbb{P}\left(\sum_{\tau=1}^{t}\left(\mathbf{1}_{\tau}\frac{\hat{p}_{\tau}-p% _{\tau}}{p_{\tau}}+(1-\mathbf{1}_{\tau})\frac{{p}_{\tau}-\hat{p}_{\tau}}{1-p_{% \tau}}\right)\leq\xi\right)\geq 1-\exp\left\{-\frac{\xi^{2}}{2tB_{p}^{2}}% \right\}.blackboard_P ( ∑ start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT divide start_ARG over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_ARG start_ARG italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_ARG + ( 1 - bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) divide start_ARG italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_ARG ) ≤ italic_ξ ) ≥ 1 - roman_exp { - divide start_ARG italic_ξ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_t italic_B start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG } . (55)

We set exp{ξ22tBp2}=δtsuperscript𝜉22𝑡superscriptsubscript𝐵𝑝2subscript𝛿𝑡\exp\left\{-\frac{\xi^{2}}{2tB_{p}^{2}}\right\}=\delta_{t}roman_exp { - divide start_ARG italic_ξ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_t italic_B start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG } = italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, and derive

(τ=1t(𝟏τp^τpτpτ+(1𝟏τ)pτp^τ1pτ)2tBp2log1δt)1δt.superscriptsubscript𝜏1𝑡subscript1𝜏subscript^𝑝𝜏subscript𝑝𝜏subscript𝑝𝜏1subscript1𝜏subscript𝑝𝜏subscript^𝑝𝜏1subscript𝑝𝜏2𝑡superscriptsubscript𝐵𝑝21subscript𝛿𝑡1subscript𝛿𝑡\mathbb{P}\left(\sum_{\tau=1}^{t}\left(\mathbf{1}_{\tau}\frac{\hat{p}_{\tau}-p% _{\tau}}{p_{\tau}}+(1-\mathbf{1}_{\tau})\frac{{p}_{\tau}-\hat{p}_{\tau}}{1-p_{% \tau}}\right)\leq\sqrt{{2tB_{p}^{2}\log\frac{1}{\delta_{t}}}{}}\right)\geq 1-% \delta_{t}.blackboard_P ( ∑ start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT divide start_ARG over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_ARG start_ARG italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_ARG + ( 1 - bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) divide start_ARG italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_ARG ) ≤ square-root start_ARG 2 italic_t italic_B start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log divide start_ARG 1 end_ARG start_ARG italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG end_ARG ) ≥ 1 - italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT . (56)

Combining the inequality (50) and the inequality (56), the desired result is derived.

Lemma D.3.

For any fixed f^f^𝑓subscript𝑓\hat{f}\in\mathcal{B}_{f}over^ start_ARG italic_f end_ARG ∈ caligraphic_B start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT, we have, with probability at least 1δ1𝛿1-\delta1 - italic_δ,

Hστ=1t(p^τpτ)2logf((xτ,xτ,𝟏τ)τ=1t)logf^((xτ,xτ,𝟏τ)τ=1t)+2tBp2logπ2t26δ,t1.formulae-sequencesubscript𝐻𝜎superscriptsubscript𝜏1𝑡superscriptsubscript^𝑝𝜏subscript𝑝𝜏2subscript𝑓superscriptsubscriptsubscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏𝜏1𝑡subscript^𝑓superscriptsubscriptsubscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏𝜏1𝑡2𝑡superscriptsubscript𝐵𝑝2superscript𝜋2superscript𝑡26𝛿for-all𝑡1H_{\sigma}\sum_{\tau=1}^{t}\left(\hat{p}_{\tau}-p_{\tau}\right)^{2}\leq\log{% \mathbb{P}_{{f}}}((x_{\tau},x_{\tau}^{\prime},\mathbf{1}_{\tau})_{\tau=1}^{t})% -\log{\mathbb{P}_{\hat{f}}}((x_{\tau},x_{\tau}^{\prime},\mathbf{1}_{\tau})_{% \tau=1}^{t})+\sqrt{{2tB_{p}^{2}\log\frac{\pi^{2}t^{2}}{6\delta}}},\;\;\forall t% \geq 1.italic_H start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ roman_log blackboard_P start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - roman_log blackboard_P start_POSTSUBSCRIPT over^ start_ARG italic_f end_ARG end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) + square-root start_ARG 2 italic_t italic_B start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log divide start_ARG italic_π start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 6 italic_δ end_ARG end_ARG , ∀ italic_t ≥ 1 . (57)
Proof.

We use tsubscript𝑡\mathcal{E}_{t}caligraphic_E start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT 444With abuse of notation here. tsubscript𝑡\mathcal{E}_{t}caligraphic_E start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is only a local notation in this proof here. to denote the event Hστ=1t(p^τpτ)2logf((xτ,xτ,𝟏τ)τ=1t)logf^((xτ,xτ,𝟏τ)τ=1t)+2tBp2log1δtsubscript𝐻𝜎superscriptsubscript𝜏1𝑡superscriptsubscript^𝑝𝜏subscript𝑝𝜏2subscript𝑓superscriptsubscriptsubscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏𝜏1𝑡subscript^𝑓superscriptsubscriptsubscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏𝜏1𝑡2𝑡superscriptsubscript𝐵𝑝21subscript𝛿𝑡H_{\sigma}\sum_{\tau=1}^{t}\left(\hat{p}_{\tau}-p_{\tau}\right)^{2}\leq\log{% \mathbb{P}_{{f}}}((x_{\tau},x_{\tau}^{\prime},\mathbf{1}_{\tau})_{\tau=1}^{t})% -\log{\mathbb{P}_{\hat{f}}}((x_{\tau},x_{\tau}^{\prime},\mathbf{1}_{\tau})_{% \tau=1}^{t})+\sqrt{{2tB_{p}^{2}\log\frac{1}{\delta_{t}}}{}}italic_H start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ roman_log blackboard_P start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - roman_log blackboard_P start_POSTSUBSCRIPT over^ start_ARG italic_f end_ARG end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) + square-root start_ARG 2 italic_t italic_B start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log divide start_ARG 1 end_ARG start_ARG italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG end_ARG and pick δt=(6δ)/(π2t2)subscript𝛿𝑡6𝛿superscript𝜋2superscript𝑡2\delta_{t}=\nicefrac{{(6\delta)}}{{(\pi^{2}t^{2})}}italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = / start_ARG ( 6 italic_δ ) end_ARG start_ARG ( italic_π start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_ARG. We have,

(Hστ=1t(p^τpτ)2logf((xτ,xτ,𝟏τ)τ=1t)logf^((xτ,xτ,𝟏τ)τ=1t)+2tBp2log1δt,t1)formulae-sequencesubscript𝐻𝜎superscriptsubscript𝜏1𝑡superscriptsubscript^𝑝𝜏subscript𝑝𝜏2subscript𝑓superscriptsubscriptsubscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏𝜏1𝑡subscript^𝑓superscriptsubscriptsubscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏𝜏1𝑡2𝑡superscriptsubscript𝐵𝑝21subscript𝛿𝑡for-all𝑡1\displaystyle\mathbb{P}\left(H_{\sigma}\sum_{\tau=1}^{t}\left(\hat{p}_{\tau}-p% _{\tau}\right)^{2}\leq\log{\mathbb{P}_{{f}}}((x_{\tau},x_{\tau}^{\prime},% \mathbf{1}_{\tau})_{\tau=1}^{t})-\log{\mathbb{P}_{\hat{f}}}((x_{\tau},x_{\tau}% ^{\prime},\mathbf{1}_{\tau})_{\tau=1}^{t})+\sqrt{{2tB_{p}^{2}\log\frac{1}{% \delta_{t}}}{}},\forall t\geq 1\right)blackboard_P ( italic_H start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ roman_log blackboard_P start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - roman_log blackboard_P start_POSTSUBSCRIPT over^ start_ARG italic_f end_ARG end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) + square-root start_ARG 2 italic_t italic_B start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log divide start_ARG 1 end_ARG start_ARG italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG end_ARG , ∀ italic_t ≥ 1 )
=\displaystyle== 1(t=1t¯)1¯superscriptsubscript𝑡1subscript𝑡\displaystyle 1-\mathbb{P}\left(\overline{\cap_{t=1}^{\infty}\mathcal{E}_{t}}\right)1 - blackboard_P ( over¯ start_ARG ∩ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT caligraphic_E start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG )
=\displaystyle== 1(t=1t¯)1superscriptsubscript𝑡1¯subscript𝑡\displaystyle 1-\mathbb{P}\left({\cup_{t=1}^{\infty}\overline{\mathcal{E}_{t}}% }\right)1 - blackboard_P ( ∪ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT over¯ start_ARG caligraphic_E start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG )
\displaystyle\geq 1t=1(t¯)1superscriptsubscript𝑡1¯subscript𝑡\displaystyle 1-\sum_{t=1}^{\infty}\mathbb{P}\left(\overline{\mathcal{E}_{t}}\right)1 - ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT blackboard_P ( over¯ start_ARG caligraphic_E start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG )
=\displaystyle== 1t=1(1(t))1superscriptsubscript𝑡11subscript𝑡\displaystyle 1-\sum_{t=1}^{\infty}\left(1-\mathbb{P}\left({\mathcal{E}_{t}}% \right)\right)1 - ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( 1 - blackboard_P ( caligraphic_E start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) )
=\displaystyle== 1t=1(1(Hστ=1t(p^τpτ)2logf((xτ,xτ,𝟏τ)τ=1t)logf^((xτ,xτ,𝟏τ)τ=1t)+2tBp2log1δt))1superscriptsubscript𝑡11subscript𝐻𝜎superscriptsubscript𝜏1𝑡superscriptsubscript^𝑝𝜏subscript𝑝𝜏2subscript𝑓superscriptsubscriptsubscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏𝜏1𝑡subscript^𝑓superscriptsubscriptsubscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏𝜏1𝑡2𝑡superscriptsubscript𝐵𝑝21subscript𝛿𝑡\displaystyle 1-\sum_{t=1}^{\infty}\left(1-\mathbb{P}\left(H_{\sigma}\sum_{% \tau=1}^{t}\left(\hat{p}_{\tau}-p_{\tau}\right)^{2}\leq\log{\mathbb{P}_{{f}}}(% (x_{\tau},x_{\tau}^{\prime},\mathbf{1}_{\tau})_{\tau=1}^{t})-\log{\mathbb{P}_{% \hat{f}}}((x_{\tau},x_{\tau}^{\prime},\mathbf{1}_{\tau})_{\tau=1}^{t})+\sqrt{{% 2tB_{p}^{2}\log\frac{1}{\delta_{t}}}{}}\right)\right)1 - ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( 1 - blackboard_P ( italic_H start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ roman_log blackboard_P start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - roman_log blackboard_P start_POSTSUBSCRIPT over^ start_ARG italic_f end_ARG end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) + square-root start_ARG 2 italic_t italic_B start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log divide start_ARG 1 end_ARG start_ARG italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG end_ARG ) )
\displaystyle\geq 1t=1δt1superscriptsubscript𝑡1subscript𝛿𝑡\displaystyle 1-\sum_{t=1}^{\infty}\delta_{t}1 - ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT
=\displaystyle== 16δπ2t=11t216𝛿superscript𝜋2superscriptsubscript𝑡11superscript𝑡2\displaystyle 1-\frac{6\delta}{\pi^{2}}\sum_{t=1}^{\infty}\frac{1}{t^{2}}1 - divide start_ARG 6 italic_δ end_ARG start_ARG italic_π start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG
=\displaystyle== 1δ.1𝛿\displaystyle 1-\delta.1 - italic_δ .

Main Proof: Resetting the ‘δ𝛿\deltaitalic_δ’ in Lem. D.3 to be δ/𝒩(f,ϵ,)\nicefrac{{\delta}}{{\mathcal{N}(\mathcal{B}_{f},\epsilon,\|\cdot\|_{\infty})}}/ start_ARG italic_δ end_ARG start_ARG caligraphic_N ( caligraphic_B start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT , italic_ϵ , ∥ ⋅ ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ) end_ARG, we can guarantee the Eq. (57) holds for all the function in an ϵitalic-ϵ\epsilonitalic_ϵ-covering of fsubscript𝑓\mathcal{B}_{f}caligraphic_B start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT with probability at least 1δ1𝛿1-\delta1 - italic_δ, by applying the probability union bound.

For any f^t+1ft+1fsubscript^𝑓𝑡1superscriptsubscript𝑓𝑡1subscript𝑓\hat{f}_{t+1}\in\mathcal{B}_{f}^{t+1}\subset\mathcal{B}_{f}over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ∈ caligraphic_B start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ⊂ caligraphic_B start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT, there exists a function in the ϵitalic-ϵ\epsilonitalic_ϵ-covering of fsubscript𝑓\mathcal{B}_{f}caligraphic_B start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT, which we set to be f^^𝑓\hat{f}over^ start_ARG italic_f end_ARG, such that f^t+1f^ϵsubscriptnormsubscript^𝑓𝑡1^𝑓italic-ϵ\|\hat{f}_{t+1}-\hat{f}\|_{\infty}\leq\epsilon∥ over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT - over^ start_ARG italic_f end_ARG ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ≤ italic_ϵ. We also use p^τt+1superscriptsubscript^𝑝𝜏𝑡1\hat{p}_{\tau}^{t+1}over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT to denote σ(f^t+1(xτ)f^t+1(xτ))𝜎subscript^𝑓𝑡1subscript𝑥𝜏subscript^𝑓𝑡1superscriptsubscript𝑥𝜏\sigma(\hat{f}_{t+1}(x_{\tau})-\hat{f}_{t+1}(x_{\tau}^{\prime}))italic_σ ( over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) - over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ). Thus, we have,

Hστ=1t(p^τt+1pτ)2subscript𝐻𝜎superscriptsubscript𝜏1𝑡superscriptsubscriptsuperscript^𝑝𝑡1𝜏subscript𝑝𝜏2\displaystyle H_{\sigma}\sum_{\tau=1}^{t}\left(\hat{p}^{t+1}_{\tau}-p_{\tau}% \right)^{2}italic_H start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( over^ start_ARG italic_p end_ARG start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (58)
\displaystyle\leq 2Hστ=1t(p^τt+1p^τ)2+2Hστ=1t(p^τpτ)22subscript𝐻𝜎superscriptsubscript𝜏1𝑡superscriptsubscriptsuperscript^𝑝𝑡1𝜏subscript^𝑝𝜏22subscript𝐻𝜎superscriptsubscript𝜏1𝑡superscriptsubscript^𝑝𝜏subscript𝑝𝜏2\displaystyle 2H_{\sigma}\sum_{\tau=1}^{t}\left(\hat{p}^{t+1}_{\tau}-\hat{p}_{% \tau}\right)^{2}+2H_{\sigma}\sum_{\tau=1}^{t}\left(\hat{p}_{\tau}-{p}_{\tau}% \right)^{2}2 italic_H start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( over^ start_ARG italic_p end_ARG start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 italic_H start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (59)
\displaystyle\leq 2Hσσ¯2τ=1t((f^t+1(xτ)f^t+1(xτ))(f^(xτ)f^(xτ)))2+2Hστ=1t(p^τpτ)22subscript𝐻𝜎superscript¯superscript𝜎2superscriptsubscript𝜏1𝑡superscriptsubscript^𝑓𝑡1subscript𝑥𝜏subscript^𝑓𝑡1subscriptsuperscript𝑥𝜏^𝑓subscript𝑥𝜏^𝑓subscriptsuperscript𝑥𝜏22subscript𝐻𝜎superscriptsubscript𝜏1𝑡superscriptsubscript^𝑝𝜏subscript𝑝𝜏2\displaystyle 2H_{\sigma}\bar{\sigma^{\prime}}^{2}\sum_{\tau=1}^{t}\left((\hat% {f}_{t+1}(x_{\tau})-\hat{f}_{t+1}(x^{\prime}_{\tau}))-(\hat{f}(x_{\tau})-\hat{% f}(x^{\prime}_{\tau}))\right)^{2}+2H_{\sigma}\sum_{\tau=1}^{t}\left(\hat{p}_{% \tau}-{p}_{\tau}\right)^{2}2 italic_H start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT over¯ start_ARG italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( ( over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) - over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) ) - ( over^ start_ARG italic_f end_ARG ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) - over^ start_ARG italic_f end_ARG ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 italic_H start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (60)
\displaystyle\leq 8Hσσ¯2τ=1tϵ2+2Hστ=1t(p^τpτ)28subscript𝐻𝜎superscript¯superscript𝜎2superscriptsubscript𝜏1𝑡superscriptitalic-ϵ22subscript𝐻𝜎superscriptsubscript𝜏1𝑡superscriptsubscript^𝑝𝜏subscript𝑝𝜏2\displaystyle 8H_{\sigma}\bar{\sigma^{\prime}}^{2}\sum_{\tau=1}^{t}\epsilon^{2% }+2H_{\sigma}\sum_{\tau=1}^{t}\left(\hat{p}_{\tau}-{p}_{\tau}\right)^{2}8 italic_H start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT over¯ start_ARG italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 italic_H start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (61)
\displaystyle\leq 8Hσσ¯2ϵ2t+8tBp2logπ2t2𝒩(f,ϵ,)6δ+2(logf((xτ,xτ,𝟏τ)τ=1t)logf^((xτ,xτ,𝟏τ)τ=1t))\displaystyle 8H_{\sigma}\bar{\sigma^{\prime}}^{2}\epsilon^{2}t+\sqrt{8{tB_{p}% ^{2}\log\frac{\pi^{2}t^{2}\mathcal{N}(\mathcal{B}_{f},\epsilon,\|\cdot\|_{% \infty})}{6\delta}}{}}+2\left(\log{\mathbb{P}_{{f}}}((x_{\tau},x_{\tau}^{% \prime},\mathbf{1}_{\tau})_{\tau=1}^{t})-\log{\mathbb{P}_{\hat{f}}}((x_{\tau},% x_{\tau}^{\prime},\mathbf{1}_{\tau})_{\tau=1}^{t})\right)8 italic_H start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT over¯ start_ARG italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_t + square-root start_ARG 8 italic_t italic_B start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log divide start_ARG italic_π start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT caligraphic_N ( caligraphic_B start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT , italic_ϵ , ∥ ⋅ ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ) end_ARG start_ARG 6 italic_δ end_ARG end_ARG + 2 ( roman_log blackboard_P start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - roman_log blackboard_P start_POSTSUBSCRIPT over^ start_ARG italic_f end_ARG end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ) (62)
\displaystyle\leq C(ϵ,δ,t)+2(logf^tMLE((xτ,xτ,𝟏τ)τ=1t)logf^t+1((xτ,xτ,𝟏τ)τ=1t))𝐶italic-ϵ𝛿𝑡2subscriptsuperscriptsubscript^𝑓𝑡MLEsuperscriptsubscriptsubscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏𝜏1𝑡subscriptsubscript^𝑓𝑡1superscriptsubscriptsubscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏𝜏1𝑡\displaystyle C(\epsilon,\delta,t)+2\left(\log{\mathbb{P}_{\hat{f}_{t}^{% \mathrm{MLE}}}}((x_{\tau},x_{\tau}^{\prime},\mathbf{1}_{\tau})_{\tau=1}^{t})-% \log{\mathbb{P}_{\hat{f}_{t+1}}}((x_{\tau},x_{\tau}^{\prime},\mathbf{1}_{\tau}% )_{\tau=1}^{t})\right)italic_C ( italic_ϵ , italic_δ , italic_t ) + 2 ( roman_log blackboard_P start_POSTSUBSCRIPT over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_MLE end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - roman_log blackboard_P start_POSTSUBSCRIPT over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ) (63)
+2(logf^t+1((xτ,xτ,𝟏τ)τ=1t)logf^((xτ,xτ,𝟏τ)τ=1t)\displaystyle+2\left(\log{\mathbb{P}_{\hat{f}_{t+1}}}((x_{\tau},x_{\tau}^{% \prime},\mathbf{1}_{\tau})_{\tau=1}^{t})-\log{\mathbb{P}_{\hat{f}}}((x_{\tau},% x_{\tau}^{\prime},\mathbf{1}_{\tau})_{\tau=1}^{t}\right)+ 2 ( roman_log blackboard_P start_POSTSUBSCRIPT over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - roman_log blackboard_P start_POSTSUBSCRIPT over^ start_ARG italic_f end_ARG end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT )
\displaystyle\leq C(ϵ,δ,t)+2CLϵt+2β1(ϵ,δ,t)𝐶italic-ϵ𝛿𝑡2subscript𝐶𝐿italic-ϵ𝑡2subscript𝛽1italic-ϵ𝛿𝑡\displaystyle C(\epsilon,\delta,t)+2C_{L}\epsilon t+2\beta_{1}(\epsilon,\delta% ,t)italic_C ( italic_ϵ , italic_δ , italic_t ) + 2 italic_C start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT italic_ϵ italic_t + 2 italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_ϵ , italic_δ , italic_t ) (64)
=\displaystyle== β2(ϵ,δ,t)+2β1(ϵ,δ,t),subscript𝛽2italic-ϵ𝛿𝑡2subscript𝛽1italic-ϵ𝛿𝑡\displaystyle\beta_{2}(\epsilon,\delta,t)+2\beta_{1}(\epsilon,\delta,t),italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_ϵ , italic_δ , italic_t ) + 2 italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_ϵ , italic_δ , italic_t ) , (65)

where C(ϵ,δ,t)=8Hσσ¯2ϵ2t+8tBp2logπ2t2𝒩(f,ϵ,)6δC(\epsilon,\delta,t)=8H_{\sigma}\bar{\sigma^{\prime}}^{2}\epsilon^{2}t+\sqrt{8% {tB_{p}^{2}\log\frac{\pi^{2}t^{2}\mathcal{N}(\mathcal{B}_{f},\epsilon,\|\cdot% \|_{\infty})}{6\delta}}{}}italic_C ( italic_ϵ , italic_δ , italic_t ) = 8 italic_H start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT over¯ start_ARG italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_t + square-root start_ARG 8 italic_t italic_B start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log divide start_ARG italic_π start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT caligraphic_N ( caligraphic_B start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT , italic_ϵ , ∥ ⋅ ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ) end_ARG start_ARG 6 italic_δ end_ARG end_ARG and β2(ϵ,δ,t)=C(ϵ,δ,t)+2CLϵtsubscript𝛽2italic-ϵ𝛿𝑡𝐶italic-ϵ𝛿𝑡2subscript𝐶𝐿italic-ϵ𝑡\beta_{2}(\epsilon,\delta,t)=C(\epsilon,\delta,t)+2C_{L}\epsilon titalic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_ϵ , italic_δ , italic_t ) = italic_C ( italic_ϵ , italic_δ , italic_t ) + 2 italic_C start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT italic_ϵ italic_t. The inequality (59) follows by the fact that (a+b)22a2+2b2,a,bformulae-sequencesuperscript𝑎𝑏22superscript𝑎22superscript𝑏2for-all𝑎𝑏(a+b)^{2}\leq 2a^{2}+2b^{2},\forall a,b\in\mathbb{R}( italic_a + italic_b ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ 2 italic_a start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 italic_b start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , ∀ italic_a , italic_b ∈ blackboard_R. The inequality (61) follows because f^t+1f^ϵsubscriptnormsubscript^𝑓𝑡1^𝑓italic-ϵ\|\hat{f}_{t+1}-\hat{f}\|_{\infty}\leq\epsilon∥ over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT - over^ start_ARG italic_f end_ARG ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ≤ italic_ϵ. The inequality (62) follows by Lem. D.3 (with reset of ‘δ𝛿\deltaitalic_δ’). The inequality (63) follows by that

logf^tMLE((xτ,xτ,𝟏τ)τ=1t)logf((xτ,xτ,𝟏τ)τ=1t).subscriptsuperscriptsubscript^𝑓𝑡MLEsuperscriptsubscriptsubscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏𝜏1𝑡subscript𝑓superscriptsubscriptsubscript𝑥𝜏superscriptsubscript𝑥𝜏subscript1𝜏𝜏1𝑡\log{\mathbb{P}_{\hat{f}_{t}^{\mathrm{MLE}}}}((x_{\tau},x_{\tau}^{\prime},% \mathbf{1}_{\tau})_{\tau=1}^{t})\geq\log{\mathbb{P}_{{f}}}((x_{\tau},x_{\tau}^% {\prime},\mathbf{1}_{\tau})_{\tau=1}^{t}).roman_log blackboard_P start_POSTSUBSCRIPT over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_MLE end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ≥ roman_log blackboard_P start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_1 start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) .

The inequality (64) follows by the fact that f^t+1ft+1subscript^𝑓𝑡1subscriptsuperscript𝑡1𝑓\hat{f}_{t+1}\in\mathcal{B}^{t+1}_{f}over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ∈ caligraphic_B start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT and Lem. C.3.

Furthermore,

τ=1t(p^τt+1pτ)2superscriptsubscript𝜏1𝑡superscriptsubscriptsuperscript^𝑝𝑡1𝜏subscript𝑝𝜏2\displaystyle\sum_{\tau=1}^{t}\left(\hat{p}^{t+1}_{\tau}-p_{\tau}\right)^{2}∑ start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( over^ start_ARG italic_p end_ARG start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - italic_p start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT =τ=1t(σ(f^t+1(xτ)f^t+1(xτ))σ(f(xτ)f(xτ)))2absentsuperscriptsubscript𝜏1𝑡superscript𝜎subscript^𝑓𝑡1subscript𝑥𝜏subscript^𝑓𝑡1superscriptsubscript𝑥𝜏𝜎𝑓subscript𝑥𝜏𝑓superscriptsubscript𝑥𝜏2\displaystyle=\sum_{\tau=1}^{t}\left(\sigma\left(\hat{f}_{t+1}(x_{\tau})-\hat{% f}_{t+1}(x_{\tau}^{\prime})\right)-\sigma\left({f}(x_{\tau})-{f}(x_{\tau}^{% \prime})\right)\right)^{2}= ∑ start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( italic_σ ( over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) - over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) - italic_σ ( italic_f ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) - italic_f ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (66)
τ=1tσ¯2((f^t+1(xτ)f^t+1(xτ))(f(xτ)f(xτ)))2,absentsuperscriptsubscript𝜏1𝑡superscriptsuperscript¯𝜎2superscriptsubscript^𝑓𝑡1subscript𝑥𝜏subscript^𝑓𝑡1superscriptsubscript𝑥𝜏𝑓subscript𝑥𝜏𝑓superscriptsubscript𝑥𝜏2\displaystyle\geq\sum_{\tau=1}^{t}{\underline{\sigma}^{\prime}}^{2}\left(\left% (\hat{f}_{t+1}(x_{\tau})-\hat{f}_{t+1}(x_{\tau}^{\prime})\right)-\left({f}(x_{% \tau})-{f}(x_{\tau}^{\prime})\right)\right)^{2},≥ ∑ start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT under¯ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( ( over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) - over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) - ( italic_f ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) - italic_f ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (67)

where the inequality follows by mean value theorem. The conclusion then follows.

Appendix E Proof of Thm. 3.6

Before we proceed to prove Thm. 3.6, we first conduct a black-box analysis in Sec. E.1 to bound the pointwise error for a generic RKHS with a generic learning scheme, which we think can be of independent interest.

E.1 Black-box analysis on the pointwise inference error in a generic RKHS

Suppose we have a generic RKHS ~~\tilde{\mathcal{H}}over~ start_ARG caligraphic_H end_ARG with a generic positive semidefinite kernel function k~(,)~𝑘\tilde{k}(\cdot,\cdot)over~ start_ARG italic_k end_ARG ( ⋅ , ⋅ ). After obtaining some information (preference information in this paper) on a sequence x~1,x~2,,x~t1subscript~𝑥1subscript~𝑥2subscript~𝑥𝑡1\tilde{x}_{1},\tilde{x}_{2},\cdot,\tilde{x}_{t-1}over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋅ , over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT, a learning scheme outputs a learnt uncertainty set,

𝒮t={h~|τ=1t1(h~(x~τ)h(x~τ))2β~t},subscript𝒮𝑡conditional-set~superscriptsubscript𝜏1𝑡1superscript~subscript~𝑥𝜏subscript~𝑥𝜏2subscript~𝛽𝑡{\mathcal{S}}_{t}=\{\tilde{h}\in\mathcal{B}|\sum_{\tau=1}^{t-1}\left(\tilde{h}% (\tilde{x}_{\tau})-{h}(\tilde{x}_{\tau})\right)^{2}\leq\tilde{\beta}_{t}\},caligraphic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = { over~ start_ARG italic_h end_ARG ∈ caligraphic_B | ∑ start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ( over~ start_ARG italic_h end_ARG ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) - italic_h ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ over~ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } , (68)

where \mathcal{B}caligraphic_B is a function space ball with radius B~~𝐵\tilde{B}over~ start_ARG italic_B end_ARG in ~~\tilde{\mathcal{H}}over~ start_ARG caligraphic_H end_ARG, hh\in\mathcal{B}italic_h ∈ caligraphic_B is the ground truth function and β~tsubscript~𝛽𝑡\tilde{\beta}_{t}over~ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT quantifies the size of this confidence set. Let 𝒳~~𝒳\tilde{\mathcal{X}}over~ start_ARG caligraphic_X end_ARG denote the function input set, which is assumed to be compact. We introduce the function,

σ~t2(x~)superscriptsubscript~𝜎𝑡2~𝑥\displaystyle\tilde{\sigma}_{t}^{2}(\tilde{x})over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( over~ start_ARG italic_x end_ARG ) =k~(x~,x~)k~(x~1:t1,x~)(K~t1+λI)1k~(x~1:t1,x~),absent~𝑘~𝑥~𝑥~𝑘superscriptsubscript~𝑥:1𝑡1~𝑥topsuperscriptsubscript~𝐾𝑡1𝜆𝐼1~𝑘subscript~𝑥:1𝑡1~𝑥\displaystyle=\tilde{k}\left(\tilde{x},{\tilde{x}}\right)-\tilde{k}(\tilde{x}_% {1:t-1},\tilde{x})^{\top}\left(\tilde{K}_{t-1}+\lambda I\right)^{-1}\tilde{k}% \left(\tilde{x}_{1:t-1},{\tilde{x}}\right),= over~ start_ARG italic_k end_ARG ( over~ start_ARG italic_x end_ARG , over~ start_ARG italic_x end_ARG ) - over~ start_ARG italic_k end_ARG ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 1 : italic_t - 1 end_POSTSUBSCRIPT , over~ start_ARG italic_x end_ARG ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( over~ start_ARG italic_K end_ARG start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + italic_λ italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over~ start_ARG italic_k end_ARG ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 1 : italic_t - 1 end_POSTSUBSCRIPT , over~ start_ARG italic_x end_ARG ) , (69)

where λ𝜆\lambdaitalic_λ is a positive constant and K~t1=(k~(x~i,x~j))i[t1],j[t1]subscript~𝐾𝑡1subscript~𝑘subscript~𝑥𝑖subscript~𝑥𝑗formulae-sequence𝑖delimited-[]𝑡1𝑗delimited-[]𝑡1\tilde{K}_{t-1}=(\tilde{k}(\tilde{x}_{i},\tilde{x}_{j}))_{i\in[t-1],j\in[t-1]}over~ start_ARG italic_K end_ARG start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT = ( over~ start_ARG italic_k end_ARG ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ) start_POSTSUBSCRIPT italic_i ∈ [ italic_t - 1 ] , italic_j ∈ [ italic_t - 1 ] end_POSTSUBSCRIPT. We have the following theorem.

Theorem E.1.

h~𝒮t+1,x~𝒳~formulae-sequencefor-all~subscript𝒮𝑡1~𝑥~𝒳\forall\tilde{h}\in\mathcal{S}_{t+1},\tilde{x}\in\tilde{\mathcal{X}}∀ over~ start_ARG italic_h end_ARG ∈ caligraphic_S start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT , over~ start_ARG italic_x end_ARG ∈ over~ start_ARG caligraphic_X end_ARG, we have,

|h~(x~)h(x~)|2(B~+λ1/2β~t+11/2)σ~t+1(x~).~~𝑥~𝑥2~𝐵superscript𝜆12superscriptsubscript~𝛽𝑡112subscript~𝜎𝑡1~𝑥|{\tilde{h}(\tilde{x})-h(\tilde{x})}|\leq 2(\tilde{B}+\lambda^{-\nicefrac{{1}}% {{2}}}\tilde{\beta}_{t+1}^{\nicefrac{{1}}{{2}}})\tilde{\sigma}_{t+1}(\tilde{x}).| over~ start_ARG italic_h end_ARG ( over~ start_ARG italic_x end_ARG ) - italic_h ( over~ start_ARG italic_x end_ARG ) | ≤ 2 ( over~ start_ARG italic_B end_ARG + italic_λ start_POSTSUPERSCRIPT - / start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT over~ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT / start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ) over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG ) . (70)
Proof.

For simplicity, we use ϕ(x~)italic-ϕ~𝑥\phi(\tilde{x})italic_ϕ ( over~ start_ARG italic_x end_ARG ) to denote the function k~(x~,)~𝑘~𝑥\tilde{k}(\tilde{x},\cdot)over~ start_ARG italic_k end_ARG ( over~ start_ARG italic_x end_ARG , ⋅ ), where ϕ:d~~:italic-ϕsuperscript~𝑑~\phi:\mathbb{R}^{\tilde{d}}\to\tilde{\mathcal{H}}italic_ϕ : blackboard_R start_POSTSUPERSCRIPT over~ start_ARG italic_d end_ARG end_POSTSUPERSCRIPT → over~ start_ARG caligraphic_H end_ARG maps a finite dimensional point x~d~~𝑥superscript~𝑑\tilde{x}\in\mathbb{R}^{\tilde{d}}over~ start_ARG italic_x end_ARG ∈ blackboard_R start_POSTSUPERSCRIPT over~ start_ARG italic_d end_ARG end_POSTSUPERSCRIPT to the RKHS ~~\tilde{\mathcal{H}}over~ start_ARG caligraphic_H end_ARG. For simplicity, we use h1h2superscriptsubscript1topsubscript2h_{1}^{\top}h_{2}italic_h start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_h start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT to denote the inner product of two functions h1,h2subscript1subscript2h_{1},h_{2}italic_h start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_h start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT from the RKHS ~~\tilde{\mathcal{H}}over~ start_ARG caligraphic_H end_ARG. Therefore, h(x~)=h,k~(x~,)k~=hϕ(x~)~𝑥subscript~𝑘~𝑥~𝑘superscripttopitalic-ϕ~𝑥h(\tilde{x})=\langle h,\tilde{k}(\tilde{x},\cdot)\rangle_{\tilde{k}}=h^{\top}% \phi(\tilde{x})italic_h ( over~ start_ARG italic_x end_ARG ) = ⟨ italic_h , over~ start_ARG italic_k end_ARG ( over~ start_ARG italic_x end_ARG , ⋅ ) ⟩ start_POSTSUBSCRIPT over~ start_ARG italic_k end_ARG end_POSTSUBSCRIPT = italic_h start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_ϕ ( over~ start_ARG italic_x end_ARG ) and k~(x~,x~¯)=k~(x~,),k~(x~¯,)=ϕ(x~)ϕ(x~¯)~𝑘~𝑥¯~𝑥~𝑘~𝑥~𝑘¯~𝑥italic-ϕsuperscript~𝑥topitalic-ϕ¯~𝑥\tilde{k}(\tilde{x},\bar{\tilde{x}})=\langle\tilde{k}(\tilde{x},\cdot),\tilde{% k}(\bar{\tilde{x}},\cdot)\rangle=\phi(\tilde{x})^{\top}\phi(\bar{\tilde{x}})over~ start_ARG italic_k end_ARG ( over~ start_ARG italic_x end_ARG , over¯ start_ARG over~ start_ARG italic_x end_ARG end_ARG ) = ⟨ over~ start_ARG italic_k end_ARG ( over~ start_ARG italic_x end_ARG , ⋅ ) , over~ start_ARG italic_k end_ARG ( over¯ start_ARG over~ start_ARG italic_x end_ARG end_ARG , ⋅ ) ⟩ = italic_ϕ ( over~ start_ARG italic_x end_ARG ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_ϕ ( over¯ start_ARG over~ start_ARG italic_x end_ARG end_ARG ), x~,x~¯𝒳~for-all~𝑥¯~𝑥~𝒳\forall\tilde{x},\bar{\tilde{x}}\in\tilde{\mathcal{X}}∀ over~ start_ARG italic_x end_ARG , over¯ start_ARG over~ start_ARG italic_x end_ARG end_ARG ∈ over~ start_ARG caligraphic_X end_ARG. We can introduce the feature map

Φt:=[ϕ(x~1),,ϕ(x~t)],assignsubscriptΦ𝑡superscriptitalic-ϕsuperscriptsubscript~𝑥1topitalic-ϕsuperscriptsubscript~𝑥𝑡toptop\Phi_{t}\vcentcolon=\left[\phi(\tilde{x}_{1})^{\top},\ldots,\phi(\tilde{x}_{t}% )^{\top}\right]^{\top},roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT := [ italic_ϕ ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT , … , italic_ϕ ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ,

we then get the kernel matrix K~t=ΦtΦt=(k~(x~i,x~j))i,j[t]subscript~𝐾𝑡subscriptΦ𝑡superscriptsubscriptΦ𝑡topsubscript~𝑘subscript~𝑥𝑖subscript~𝑥𝑗𝑖𝑗delimited-[]𝑡\tilde{K}_{t}=\Phi_{t}\Phi_{t}^{\top}=(\tilde{k}(\tilde{x}_{i},\tilde{x}_{j}))% _{i,j\in[t]}over~ start_ARG italic_K end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT = ( over~ start_ARG italic_k end_ARG ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ) start_POSTSUBSCRIPT italic_i , italic_j ∈ [ italic_t ] end_POSTSUBSCRIPT, k~t(x~)=Φtϕ(x~)=(k~(x~,x~i))i[t]subscript~𝑘𝑡~𝑥subscriptΦ𝑡italic-ϕ~𝑥subscript~𝑘~𝑥subscript~𝑥𝑖𝑖delimited-[]𝑡\tilde{k}_{t}(\tilde{x})=\Phi_{t}\phi(\tilde{x})=(\tilde{k}(\tilde{x},\tilde{x% }_{i}))_{i\in[t]}over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG ) = roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_ϕ ( over~ start_ARG italic_x end_ARG ) = ( over~ start_ARG italic_k end_ARG ( over~ start_ARG italic_x end_ARG , over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) start_POSTSUBSCRIPT italic_i ∈ [ italic_t ] end_POSTSUBSCRIPT for all x~𝒳~~𝑥~𝒳\tilde{x}\in\tilde{\mathcal{X}}over~ start_ARG italic_x end_ARG ∈ over~ start_ARG caligraphic_X end_ARG and h1:t=Φthsubscript:1𝑡subscriptΦ𝑡h_{1:t}=\Phi_{t}hitalic_h start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT = roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_h.

Note that when the Hilbert space ~~\tilde{\mathcal{H}}over~ start_ARG caligraphic_H end_ARG is finite-dimensional, ΦtsubscriptΦ𝑡\Phi_{t}roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is interpreted as the normal finite-dimensional matrix. In the more general setting where ~~\tilde{\mathcal{H}}over~ start_ARG caligraphic_H end_ARG can be an infinite-dimensional space, ΦtsubscriptΦ𝑡\Phi_{t}roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the evaluation operator ~t~superscript𝑡\tilde{\mathcal{H}}\to\mathbb{R}^{t}over~ start_ARG caligraphic_H end_ARG → blackboard_R start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT defined as Φth:=[h(x~1),,h(x~t)],h~formulae-sequenceassignsubscriptΦ𝑡superscriptsubscript~𝑥1subscript~𝑥𝑡topfor-all~\Phi_{t}h\vcentcolon=[h(\tilde{x}_{1}),\cdots,h(\tilde{x}_{t})]^{\top},\forall h% \in\tilde{\mathcal{H}}roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_h := [ italic_h ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , ⋯ , italic_h ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT , ∀ italic_h ∈ over~ start_ARG caligraphic_H end_ARG, with Φt:tH~:superscriptsubscriptΦ𝑡topsuperscript𝑡~𝐻\Phi_{t}^{\top}:\mathbb{R}^{t}\to\tilde{H}roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT → over~ start_ARG italic_H end_ARG as its adjoint operator. For the simplicity of notation, we abuse the notation I𝐼Iitalic_I to denote the identity mapping in both the RKHS ~~\tilde{\mathcal{H}}over~ start_ARG caligraphic_H end_ARG and tsuperscript𝑡\mathbb{R}^{t}blackboard_R start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT. The specific meaning of I𝐼Iitalic_I depends on the context.

Since the matrices/operators (ΦtΦt+λI)superscriptsubscriptΦ𝑡topsubscriptΦ𝑡𝜆𝐼(\Phi_{t}^{\top}\Phi_{t}+\lambda I)( roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_λ italic_I ) and (ΦtΦt+λI)subscriptΦ𝑡superscriptsubscriptΦ𝑡top𝜆𝐼(\Phi_{t}\Phi_{t}^{\top}+\lambda I)( roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_λ italic_I ) are strictly positive definite and

(ΦtΦt+λI)Φt=Φt(ΦtΦt+λI),superscriptsubscriptΦ𝑡topsubscriptΦ𝑡𝜆𝐼superscriptsubscriptΦ𝑡topsuperscriptsubscriptΦ𝑡topsubscriptΦ𝑡superscriptsubscriptΦ𝑡top𝜆𝐼(\Phi_{t}^{\top}\Phi_{t}+\lambda I)\Phi_{t}^{\top}=\Phi_{t}^{\top}(\Phi_{t}% \Phi_{t}^{\top}+\lambda I),( roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_λ italic_I ) roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT = roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_λ italic_I ) ,

we have

Φt(ΦtΦt+λI)1=(ΦtΦt+λI)1Φt.superscriptsubscriptΦ𝑡topsuperscriptsubscriptΦ𝑡superscriptsubscriptΦ𝑡top𝜆𝐼1superscriptsuperscriptsubscriptΦ𝑡topsubscriptΦ𝑡𝜆𝐼1superscriptsubscriptΦ𝑡top\Phi_{t}^{\top}(\Phi_{t}\Phi_{t}^{\top}+\lambda I)^{-1}=(\Phi_{t}^{\top}\Phi_{% t}+\lambda I)^{-1}\Phi_{t}^{\top}.roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_λ italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = ( roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_λ italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT . (71)

Also from the definitions above (ΦtΦt+λI)ϕ(x~)=Φtk~t(x~)+λϕ(x~)superscriptsubscriptΦ𝑡topsubscriptΦ𝑡𝜆𝐼italic-ϕ~𝑥superscriptsubscriptΦ𝑡topsubscript~𝑘𝑡~𝑥𝜆italic-ϕ~𝑥(\Phi_{t}^{\top}\Phi_{t}+\lambda I)\phi(\tilde{x})=\Phi_{t}^{\top}\tilde{k}_{t% }(\tilde{x})+\lambda\phi(\tilde{x})( roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_λ italic_I ) italic_ϕ ( over~ start_ARG italic_x end_ARG ) = roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG ) + italic_λ italic_ϕ ( over~ start_ARG italic_x end_ARG ), and thus ϕ(x~)=(ΦtΦt+λI)1Φtk~t(x~)+λ(ΦtΦt+λI)1ϕ(x~)italic-ϕ~𝑥superscriptsuperscriptsubscriptΦ𝑡topsubscriptΦ𝑡𝜆𝐼1superscriptsubscriptΦ𝑡topsubscript~𝑘𝑡~𝑥𝜆superscriptsuperscriptsubscriptΦ𝑡topsubscriptΦ𝑡𝜆𝐼1italic-ϕ~𝑥\phi(\tilde{x})=(\Phi_{t}^{\top}\Phi_{t}+\lambda I)^{-1}\Phi_{t}^{\top}\tilde{% k}_{t}(\tilde{x})+\lambda(\Phi_{t}^{\top}\Phi_{t}+\lambda I)^{-1}\phi(\tilde{x})italic_ϕ ( over~ start_ARG italic_x end_ARG ) = ( roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_λ italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG ) + italic_λ ( roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_λ italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_ϕ ( over~ start_ARG italic_x end_ARG ). Hence, from Eq. (71) we deduce that

ϕ(x~)=Φt(ΦtΦt+λI)1k~t(x~)+λ(ΦtΦt+λI)1ϕ(x~),italic-ϕ~𝑥superscriptsubscriptΦ𝑡topsuperscriptsubscriptΦ𝑡superscriptsubscriptΦ𝑡top𝜆𝐼1subscript~𝑘𝑡~𝑥𝜆superscriptsuperscriptsubscriptΦ𝑡topsubscriptΦ𝑡𝜆𝐼1italic-ϕ~𝑥\phi(\tilde{x})=\Phi_{t}^{\top}(\Phi_{t}\Phi_{t}^{\top}+\lambda I)^{-1}\tilde{% k}_{t}(\tilde{x})+\lambda(\Phi_{t}^{\top}\Phi_{t}+\lambda I)^{-1}\phi(\tilde{x% }),italic_ϕ ( over~ start_ARG italic_x end_ARG ) = roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_λ italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG ) + italic_λ ( roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_λ italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_ϕ ( over~ start_ARG italic_x end_ARG ) , (72)

which gives

ϕ(x~)ϕ(x~)=k~t(x~)(ΦtΦt+λI)1k~t(x~)+λϕ(x~)(ΦtΦt+λI)1ϕ(x~),italic-ϕsuperscript~𝑥topitalic-ϕ~𝑥subscript~𝑘𝑡superscript~𝑥topsuperscriptsubscriptΦ𝑡superscriptsubscriptΦ𝑡top𝜆𝐼1subscript~𝑘𝑡~𝑥𝜆italic-ϕsuperscript~𝑥topsuperscriptsuperscriptsubscriptΦ𝑡topsubscriptΦ𝑡𝜆𝐼1italic-ϕ~𝑥\phi(\tilde{x})^{\top}\phi(\tilde{x})=\tilde{k}_{t}(\tilde{x})^{\top}(\Phi_{t}% \Phi_{t}^{\top}+\lambda I)^{-1}\tilde{k}_{t}(\tilde{x})+\lambda\phi(\tilde{x})% ^{\top}(\Phi_{t}^{\top}\Phi_{t}+\lambda I)^{-1}\phi(\tilde{x}),italic_ϕ ( over~ start_ARG italic_x end_ARG ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_ϕ ( over~ start_ARG italic_x end_ARG ) = over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_λ italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG ) + italic_λ italic_ϕ ( over~ start_ARG italic_x end_ARG ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_λ italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_ϕ ( over~ start_ARG italic_x end_ARG ) , (73)

by multiplying both sides of Eq. (72) with ϕ(x~)italic-ϕsuperscript~𝑥top\phi(\tilde{x})^{\top}italic_ϕ ( over~ start_ARG italic_x end_ARG ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT. This implies

λϕ(x~)(ΦtΦt+λI)1ϕ(x~)=k~(x~,x~)k~t(x~)(K~t+λI)1k~t(x~)=σ~t+12(x~),𝜆italic-ϕsuperscript~𝑥topsuperscriptsuperscriptsubscriptΦ𝑡topsubscriptΦ𝑡𝜆𝐼1italic-ϕ~𝑥~𝑘~𝑥~𝑥subscript~𝑘𝑡superscript~𝑥topsuperscriptsubscript~𝐾𝑡𝜆𝐼1subscript~𝑘𝑡~𝑥superscriptsubscript~𝜎𝑡12~𝑥\lambda\phi(\tilde{x})^{\top}(\Phi_{t}^{\top}\Phi_{t}+\lambda I)^{-1}\phi(% \tilde{x})=\tilde{k}(\tilde{x},\tilde{x})-\tilde{k}_{t}(\tilde{x})^{\top}(% \tilde{K}_{t}+\lambda I)^{-1}\tilde{k}_{t}(\tilde{x})=\tilde{\sigma}_{t+1}^{2}% (\tilde{x}),italic_λ italic_ϕ ( over~ start_ARG italic_x end_ARG ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_λ italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_ϕ ( over~ start_ARG italic_x end_ARG ) = over~ start_ARG italic_k end_ARG ( over~ start_ARG italic_x end_ARG , over~ start_ARG italic_x end_ARG ) - over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( over~ start_ARG italic_K end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_λ italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG ) = over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( over~ start_ARG italic_x end_ARG ) , (74)

where the second equality follows by the definition of σ~t+1(x~)subscript~𝜎𝑡1~𝑥\tilde{\sigma}_{t+1}(\tilde{x})over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG ). Now observe that h~for-all~\forall\tilde{h}\in\mathcal{B}∀ over~ start_ARG italic_h end_ARG ∈ caligraphic_B,

|h~(x~)k~t(x~)(K~t+λI)1h~1:t|~~𝑥subscript~𝑘𝑡superscript~𝑥topsuperscriptsubscript~𝐾𝑡𝜆𝐼1subscript~:1𝑡\displaystyle|{\tilde{h}(\tilde{x})-\tilde{k}_{t}(\tilde{x})^{\top}(\tilde{K}_% {t}+\lambda I)^{-1}\tilde{h}_{1:t}}|| over~ start_ARG italic_h end_ARG ( over~ start_ARG italic_x end_ARG ) - over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( over~ start_ARG italic_K end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_λ italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT | (75)
=\displaystyle== |ϕ(x~)h~ϕ(x~)Φt(ΦtΦt+λI)1Φth~|italic-ϕsuperscript~𝑥top~italic-ϕsuperscript~𝑥topsuperscriptsubscriptΦ𝑡topsuperscriptsubscriptΦ𝑡superscriptsubscriptΦ𝑡top𝜆𝐼1subscriptΦ𝑡~\displaystyle|{\phi(\tilde{x})^{\top}\tilde{h}-\phi(\tilde{x})^{\top}\Phi_{t}^% {\top}(\Phi_{t}\Phi_{t}^{\top}+\lambda I)^{-1}\Phi_{t}\tilde{h}}|| italic_ϕ ( over~ start_ARG italic_x end_ARG ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over~ start_ARG italic_h end_ARG - italic_ϕ ( over~ start_ARG italic_x end_ARG ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_λ italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT over~ start_ARG italic_h end_ARG | (76)
=\displaystyle== |ϕ(x~)h~ϕ(x~)(ΦtΦt+λI)1ΦtΦth~|italic-ϕsuperscript~𝑥top~italic-ϕsuperscript~𝑥topsuperscriptsuperscriptsubscriptΦ𝑡topsubscriptΦ𝑡𝜆𝐼1superscriptsubscriptΦ𝑡topsubscriptΦ𝑡~\displaystyle|{\phi(\tilde{x})^{\top}\tilde{h}-\phi(\tilde{x})^{\top}(\Phi_{t}% ^{\top}\Phi_{t}+\lambda I)^{-1}\Phi_{t}^{\top}\Phi_{t}\tilde{h}}|| italic_ϕ ( over~ start_ARG italic_x end_ARG ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over~ start_ARG italic_h end_ARG - italic_ϕ ( over~ start_ARG italic_x end_ARG ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_λ italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT over~ start_ARG italic_h end_ARG | (77)
=\displaystyle== |ϕ(x~)(ΦtΦt+λI)1(ΦtΦt+λI)h~ϕ(x~)(ΦtΦt+λI)1ΦtΦth~|italic-ϕsuperscript~𝑥topsuperscriptsuperscriptsubscriptΦ𝑡topsubscriptΦ𝑡𝜆𝐼1superscriptsubscriptΦ𝑡topsubscriptΦ𝑡𝜆𝐼~italic-ϕsuperscript~𝑥topsuperscriptsuperscriptsubscriptΦ𝑡topsubscriptΦ𝑡𝜆𝐼1superscriptsubscriptΦ𝑡topsubscriptΦ𝑡~\displaystyle|{\phi(\tilde{x})^{\top}(\Phi_{t}^{\top}\Phi_{t}+\lambda I)^{-1}(% \Phi_{t}^{\top}\Phi_{t}+\lambda I)\tilde{h}-\phi(\tilde{x})^{\top}(\Phi_{t}^{% \top}\Phi_{t}+\lambda I)^{-1}\Phi_{t}^{\top}\Phi_{t}\tilde{h}}|| italic_ϕ ( over~ start_ARG italic_x end_ARG ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_λ italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_λ italic_I ) over~ start_ARG italic_h end_ARG - italic_ϕ ( over~ start_ARG italic_x end_ARG ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_λ italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT over~ start_ARG italic_h end_ARG | (78)
=\displaystyle== |λϕ(x~)(ΦtΦt+λI)1h~|𝜆italic-ϕsuperscript~𝑥topsuperscriptsuperscriptsubscriptΦ𝑡topsubscriptΦ𝑡𝜆𝐼1~\displaystyle|{\lambda\phi(\tilde{x})^{\top}(\Phi_{t}^{\top}\Phi_{t}+\lambda I% )^{-1}\tilde{h}}|| italic_λ italic_ϕ ( over~ start_ARG italic_x end_ARG ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_λ italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over~ start_ARG italic_h end_ARG | (79)
\displaystyle\leq λ(ΦtTΦt+λI)1ϕ(x~)k~h~k~subscriptdelimited-∥∥𝜆superscriptsuperscriptsubscriptΦ𝑡𝑇subscriptΦ𝑡𝜆𝐼1italic-ϕ~𝑥~𝑘subscriptdelimited-∥∥~~𝑘\displaystyle\left\lVert\lambda(\Phi_{t}^{T}\Phi_{t}+\lambda I)^{-1}\phi(% \tilde{x})\right\rVert_{\tilde{k}}\left\lVert\tilde{h}\right\rVert_{\tilde{k}}∥ italic_λ ( roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_λ italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_ϕ ( over~ start_ARG italic_x end_ARG ) ∥ start_POSTSUBSCRIPT over~ start_ARG italic_k end_ARG end_POSTSUBSCRIPT ∥ over~ start_ARG italic_h end_ARG ∥ start_POSTSUBSCRIPT over~ start_ARG italic_k end_ARG end_POSTSUBSCRIPT (80)
=\displaystyle== h~k~λϕ(x~)(ΦtΦt+λI)1λI(ΦtΦt+λI)1ϕ(x~)subscriptdelimited-∥∥~~𝑘𝜆italic-ϕsuperscript~𝑥topsuperscriptsuperscriptsubscriptΦ𝑡topsubscriptΦ𝑡𝜆𝐼1𝜆𝐼superscriptsuperscriptsubscriptΦ𝑡topsubscriptΦ𝑡𝜆𝐼1italic-ϕ~𝑥\displaystyle\left\lVert\tilde{h}\right\rVert_{\tilde{k}}\sqrt{\lambda\phi(% \tilde{x})^{\top}(\Phi_{t}^{\top}\Phi_{t}+\lambda I)^{-1}\lambda I(\Phi_{t}^{% \top}\Phi_{t}+\lambda I)^{-1}\phi(\tilde{x})}∥ over~ start_ARG italic_h end_ARG ∥ start_POSTSUBSCRIPT over~ start_ARG italic_k end_ARG end_POSTSUBSCRIPT square-root start_ARG italic_λ italic_ϕ ( over~ start_ARG italic_x end_ARG ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_λ italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_λ italic_I ( roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_λ italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_ϕ ( over~ start_ARG italic_x end_ARG ) end_ARG (81)
\displaystyle\leq B~λϕ(x~)(ΦtΦt+λI)1(ΦtΦt+λI)(ΦtΦt+λI)1ϕ(x~)~𝐵𝜆italic-ϕsuperscript~𝑥topsuperscriptsuperscriptsubscriptΦ𝑡topsubscriptΦ𝑡𝜆𝐼1superscriptsubscriptΦ𝑡topsubscriptΦ𝑡𝜆𝐼superscriptsuperscriptsubscriptΦ𝑡topsubscriptΦ𝑡𝜆𝐼1italic-ϕ~𝑥\displaystyle\tilde{B}\sqrt{\lambda\phi(\tilde{x})^{\top}(\Phi_{t}^{\top}\Phi_% {t}+\lambda I)^{-1}(\Phi_{t}^{\top}\Phi_{t}+\lambda I)(\Phi_{t}^{\top}\Phi_{t}% +\lambda I)^{-1}\phi(\tilde{x})}over~ start_ARG italic_B end_ARG square-root start_ARG italic_λ italic_ϕ ( over~ start_ARG italic_x end_ARG ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_λ italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_λ italic_I ) ( roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_λ italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_ϕ ( over~ start_ARG italic_x end_ARG ) end_ARG (82)
=\displaystyle== B~σ~t+1(x~),~𝐵subscript~𝜎𝑡1~𝑥\displaystyle\tilde{B}\;\tilde{\sigma}_{t+1}(\tilde{x}),over~ start_ARG italic_B end_ARG over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG ) , (83)

where the equality (77) uses Eq. (71), the inequality (80) is by Cauchy-Schwartz, the inequality (82) follows by the assumption that h~k~B~subscriptnorm~~𝑘~𝐵\|\tilde{h}\|_{\tilde{k}}\leq\tilde{B}∥ over~ start_ARG italic_h end_ARG ∥ start_POSTSUBSCRIPT over~ start_ARG italic_k end_ARG end_POSTSUBSCRIPT ≤ over~ start_ARG italic_B end_ARG and that ΦtΦtsuperscriptsubscriptΦ𝑡topsubscriptΦ𝑡\Phi_{t}^{\top}\Phi_{t}roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is positive semidefinite, and the equality (83) is from Eq. (74). We define Δ1:t=h~1:th1:tsubscriptΔ:1𝑡subscript~:1𝑡subscript:1𝑡\Delta_{1:t}=\tilde{h}_{1:t}-{h}_{1:t}roman_Δ start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT = over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT - italic_h start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT,

|k~t(x~)(K~t+λI)1Δ1:t|subscript~𝑘𝑡superscript~𝑥topsuperscriptsubscript~𝐾𝑡𝜆𝐼1subscriptΔ:1𝑡\displaystyle|{\tilde{k}_{t}(\tilde{x})^{\top}(\tilde{K}_{t}+\lambda I)^{-1}% \Delta_{1:t}}|| over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( over~ start_ARG italic_K end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_λ italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_Δ start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT | (84)
=\displaystyle== |ϕ(x~)Φt(ΦtΦt+λI)1Δ1:t|italic-ϕsuperscript~𝑥topsuperscriptsubscriptΦ𝑡topsuperscriptsubscriptΦ𝑡superscriptsubscriptΦ𝑡top𝜆𝐼1subscriptΔ:1𝑡\displaystyle|{\phi(\tilde{x})^{\top}\Phi_{t}^{\top}(\Phi_{t}\Phi_{t}^{\top}+% \lambda I)^{-1}\Delta_{1:t}}|| italic_ϕ ( over~ start_ARG italic_x end_ARG ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_λ italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_Δ start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT | (85)
=\displaystyle== |ϕ(x~)(ΦtΦt+λI)1ΦtΔ1:t|italic-ϕsuperscript~𝑥topsuperscriptsuperscriptsubscriptΦ𝑡topsubscriptΦ𝑡𝜆𝐼1superscriptsubscriptΦ𝑡topsubscriptΔ:1𝑡\displaystyle|{\phi(\tilde{x})^{\top}(\Phi_{t}^{\top}\Phi_{t}+\lambda I)^{-1}% \Phi_{t}^{\top}\Delta_{1:t}}|| italic_ϕ ( over~ start_ARG italic_x end_ARG ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_λ italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Δ start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT | (86)
\displaystyle\leq (ΦtΦt+λI)1/2ϕ(x~)k~(ΦtΦt+λI)1/2ΦtΔ1:tk~subscriptdelimited-∥∥superscriptsuperscriptsubscriptΦ𝑡topsubscriptΦ𝑡𝜆𝐼12italic-ϕ~𝑥~𝑘subscriptdelimited-∥∥superscriptsuperscriptsubscriptΦ𝑡topsubscriptΦ𝑡𝜆𝐼12superscriptsubscriptΦ𝑡topsubscriptΔ:1𝑡~𝑘\displaystyle\left\lVert(\Phi_{t}^{\top}\Phi_{t}+\lambda I)^{-1/2}\phi(\tilde{% x})\right\rVert_{\tilde{k}}\left\lVert(\Phi_{t}^{\top}\Phi_{t}+\lambda I)^{-1/% 2}\Phi_{t}^{\top}\Delta_{1:t}\right\rVert_{\tilde{k}}∥ ( roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_λ italic_I ) start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT italic_ϕ ( over~ start_ARG italic_x end_ARG ) ∥ start_POSTSUBSCRIPT over~ start_ARG italic_k end_ARG end_POSTSUBSCRIPT ∥ ( roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_λ italic_I ) start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Δ start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT over~ start_ARG italic_k end_ARG end_POSTSUBSCRIPT (87)
=\displaystyle== ϕ(x~)(ΦtΦt+λI)1ϕ(x~)(ΦtΔ1:t)(ΦtΦt+λI)1ΦtΔ1:titalic-ϕsuperscript~𝑥topsuperscriptsuperscriptsubscriptΦ𝑡topsubscriptΦ𝑡𝜆𝐼1italic-ϕ~𝑥superscriptsuperscriptsubscriptΦ𝑡topsubscriptΔ:1𝑡topsuperscriptsuperscriptsubscriptΦ𝑡topsubscriptΦ𝑡𝜆𝐼1superscriptsubscriptΦ𝑡topsubscriptΔ:1𝑡\displaystyle\sqrt{\phi(\tilde{x})^{\top}(\Phi_{t}^{\top}\Phi_{t}+\lambda I)^{% -1}\phi(\tilde{x})}\sqrt{(\Phi_{t}^{\top}\Delta_{1:t})^{\top}(\Phi_{t}^{\top}% \Phi_{t}+\lambda I)^{-1}\Phi_{t}^{\top}\Delta_{1:t}}square-root start_ARG italic_ϕ ( over~ start_ARG italic_x end_ARG ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_λ italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_ϕ ( over~ start_ARG italic_x end_ARG ) end_ARG square-root start_ARG ( roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Δ start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_λ italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Δ start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT end_ARG (88)
=\displaystyle== λ1/2σ~t+1(x~)Δ1:tΦtΦt(ΦtΦt+λI)1Δ1:tsuperscript𝜆12subscript~𝜎𝑡1~𝑥superscriptsubscriptΔ:1𝑡topsubscriptΦ𝑡superscriptsubscriptΦ𝑡topsuperscriptsubscriptΦ𝑡superscriptsubscriptΦ𝑡top𝜆𝐼1subscriptΔ:1𝑡\displaystyle\lambda^{-1/2}\tilde{\sigma}_{t+1}(\tilde{x})\sqrt{\Delta_{1:t}^{% \top}\Phi_{t}\Phi_{t}^{\top}(\Phi_{t}\Phi_{t}^{\top}+\lambda I)^{-1}\Delta_{1:% t}}italic_λ start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG ) square-root start_ARG roman_Δ start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_λ italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_Δ start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT end_ARG (89)
=\displaystyle== λ1/2σ~t+1(x~)Δ1:tK~t(K~t+λI)1Δ1:tsuperscript𝜆12subscript~𝜎𝑡1~𝑥superscriptsubscriptΔ:1𝑡topsubscript~𝐾𝑡superscriptsubscript~𝐾𝑡𝜆𝐼1subscriptΔ:1𝑡\displaystyle\lambda^{-1/2}\tilde{\sigma}_{t+1}(\tilde{x})\sqrt{\Delta_{1:t}^{% \top}\tilde{K}_{t}(\tilde{K}_{t}+\lambda I)^{-1}\Delta_{1:t}}italic_λ start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG ) square-root start_ARG roman_Δ start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over~ start_ARG italic_K end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over~ start_ARG italic_K end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_λ italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_Δ start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT end_ARG (90)
\displaystyle\leq λ1/2σ~t+1(x~)Δ1:tΔ1:tsuperscript𝜆12subscript~𝜎𝑡1~𝑥superscriptsubscriptΔ:1𝑡topsubscriptΔ:1𝑡\displaystyle\lambda^{-1/2}\tilde{\sigma}_{t+1}(\tilde{x})\sqrt{\Delta_{1:t}^{% \top}\Delta_{1:t}}italic_λ start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG ) square-root start_ARG roman_Δ start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Δ start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT end_ARG (91)
\displaystyle\leq λ1/2β~t+11/2σ~t+1(x~)superscript𝜆12superscriptsubscript~𝛽𝑡112subscript~𝜎𝑡1~𝑥\displaystyle\lambda^{-1/2}\tilde{\beta}_{t+1}^{1/2}\tilde{\sigma}_{t+1}(% \tilde{x})italic_λ start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT over~ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG ) (92)

where the equality (86) is from Eq. (71), the inequality (87) is by Cauchy-Schwartz and the equality (89) uses both Eq. (71) and Eq. (74). We can finally derive,

|h~(x~)h(x~)|~~𝑥~𝑥\displaystyle\left|\tilde{h}(\tilde{x})-h(\tilde{x})\right|| over~ start_ARG italic_h end_ARG ( over~ start_ARG italic_x end_ARG ) - italic_h ( over~ start_ARG italic_x end_ARG ) | (93)
=\displaystyle== |k~t(x~)(K~t+λI)1(h~1:th1:t)(h(x~)k~t(x~)T(K~t+λI)1h1:t)+(h~(x~)k~t(x~)(K~t+λI)1h~1:t)|subscript~𝑘𝑡superscript~𝑥topsuperscriptsubscript~𝐾𝑡𝜆𝐼1subscript~:1𝑡subscript:1𝑡~𝑥subscript~𝑘𝑡superscript~𝑥𝑇superscriptsubscript~𝐾𝑡𝜆𝐼1subscript:1𝑡~~𝑥subscript~𝑘𝑡superscript~𝑥topsuperscriptsubscript~𝐾𝑡𝜆𝐼1subscript~:1𝑡\displaystyle\left|{\tilde{k}_{t}(\tilde{x})^{\top}(\tilde{K}_{t}+\lambda I)^{% -1}(\tilde{h}_{1:t}-h_{1:t})}-\left({h(\tilde{x})-\tilde{k}_{t}(\tilde{x})^{T}% (\tilde{K}_{t}+\lambda I)^{-1}h_{1:t}}\right)+\left({\tilde{h}(\tilde{x})-% \tilde{k}_{t}(\tilde{x})^{\top}(\tilde{K}_{t}+\lambda I)^{-1}\tilde{h}_{1:t}}% \right)\right|| over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( over~ start_ARG italic_K end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_λ italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT - italic_h start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT ) - ( italic_h ( over~ start_ARG italic_x end_ARG ) - over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( over~ start_ARG italic_K end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_λ italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_h start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT ) + ( over~ start_ARG italic_h end_ARG ( over~ start_ARG italic_x end_ARG ) - over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( over~ start_ARG italic_K end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_λ italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT ) | (94)
\displaystyle\leq |k~t(x~)(K~t+λI)1(h~1:th1:t)|+|h(x~)k~t(x~)T(K~t+λI)1h1:t|+|h~(x~)k~t(x~)(K~t+λI)1h~1:t|subscript~𝑘𝑡superscript~𝑥topsuperscriptsubscript~𝐾𝑡𝜆𝐼1subscript~:1𝑡subscript:1𝑡~𝑥subscript~𝑘𝑡superscript~𝑥𝑇superscriptsubscript~𝐾𝑡𝜆𝐼1subscript:1𝑡~~𝑥subscript~𝑘𝑡superscript~𝑥topsuperscriptsubscript~𝐾𝑡𝜆𝐼1subscript~:1𝑡\displaystyle\left|{\tilde{k}_{t}(\tilde{x})^{\top}(\tilde{K}_{t}+\lambda I)^{% -1}(\tilde{h}_{1:t}-h_{1:t})}\right|+\left|{h(\tilde{x})-\tilde{k}_{t}(\tilde{% x})^{T}(\tilde{K}_{t}+\lambda I)^{-1}h_{1:t}}\right|+\left|{\tilde{h}(\tilde{x% })-\tilde{k}_{t}(\tilde{x})^{\top}(\tilde{K}_{t}+\lambda I)^{-1}\tilde{h}_{1:t% }}\right|| over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( over~ start_ARG italic_K end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_λ italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT - italic_h start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT ) | + | italic_h ( over~ start_ARG italic_x end_ARG ) - over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( over~ start_ARG italic_K end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_λ italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_h start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT | + | over~ start_ARG italic_h end_ARG ( over~ start_ARG italic_x end_ARG ) - over~ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( over~ start_ARG italic_K end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_λ italic_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT | (95)
\displaystyle\leq (2B~+λ1/2β~t+11/2)σ~t+1(x~),2~𝐵superscript𝜆12superscriptsubscript~𝛽𝑡112subscript~𝜎𝑡1~𝑥\displaystyle\Big{(}2\tilde{B}+\lambda^{-1/2}\tilde{\beta}_{t+1}^{1/2}\Big{)}% \tilde{\sigma}_{t+1}(\tilde{x}),( 2 over~ start_ARG italic_B end_ARG + italic_λ start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT over~ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ) over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG ) , (96)

where the equality (94) follows by splitting, the inequality (95) follows by triangle inequality, the last inequality follows by combining the inequality (83) and the inequality (92). The conclusion then follows. ∎

Remark E.2.

The proof idea is inspired by the proof of Thm. 2 in (Chowdhury & Gopalan, 2017b).

E.2 Main proof of Thm. 3.6

We set the generic RKHS ~~\tilde{\mathcal{H}}over~ start_ARG caligraphic_H end_ARG to be the augmented RKHS with the additive kernel function kffsuperscript𝑘𝑓superscript𝑓k^{ff^{\prime}}italic_k start_POSTSUPERSCRIPT italic_f italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT, the function space ball to be ffsubscript𝑓superscript𝑓\mathcal{B}_{ff^{\prime}}caligraphic_B start_POSTSUBSCRIPT italic_f italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT, B~=2B~𝐵2𝐵\tilde{B}=2Bover~ start_ARG italic_B end_ARG = 2 italic_B and the confidence set as,

𝒮t:={f~(x)f~(x)|f~f,τ=1t1((f~(xτ)f~(xτ))(f(xτ)f(xτ)))2β(ϵ,δ/2,t1)}ff.assignsubscript𝒮𝑡conditional-set~𝑓𝑥~𝑓superscript𝑥formulae-sequence~𝑓subscript𝑓superscriptsubscript𝜏1𝑡1superscript~𝑓subscript𝑥𝜏~𝑓superscriptsubscript𝑥𝜏𝑓subscript𝑥𝜏𝑓subscriptsuperscript𝑥𝜏2𝛽italic-ϵ𝛿2𝑡1subscript𝑓superscript𝑓{\mathcal{S}}_{t}\vcentcolon=\left\{\tilde{f}(x)-\tilde{f}(x^{\prime})|\tilde{% f}\in\mathcal{B}_{f},\sum_{\tau=1}^{t-1}\big{(}(\tilde{f}(x_{\tau})-\tilde{f}(% x_{\tau}^{\prime}))-\left({f}(x_{\tau})-{f}(x^{\prime}_{\tau})\right)\big{)}^{% 2}\leq\beta(\epsilon,\nicefrac{{\delta}}{{2}},t-1)\right\}\subset\mathcal{B}_{% ff^{\prime}}.caligraphic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT := { over~ start_ARG italic_f end_ARG ( italic_x ) - over~ start_ARG italic_f end_ARG ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) | over~ start_ARG italic_f end_ARG ∈ caligraphic_B start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT , ∑ start_POSTSUBSCRIPT italic_τ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ( ( over~ start_ARG italic_f end_ARG ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) - over~ start_ARG italic_f end_ARG ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) - ( italic_f ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) - italic_f ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_β ( italic_ϵ , / start_ARG italic_δ end_ARG start_ARG 2 end_ARG , italic_t - 1 ) } ⊂ caligraphic_B start_POSTSUBSCRIPT italic_f italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT .

The desired result then follows by applying Thm. E.1.

Appendix F Proof of Lem. 4.1

It suffices to prove that for any feasible solution of Prob. (24), we can find a corresponding feasible solution of Prob. (25) with the same objective value and that the inverse direction also holds.

  1. 1.

    In this part, we first show that for any feasible solution of Prob. (24), we can find a corresponding feasible solution of Prob. (25) with the same objective value. Let f~~𝑓\tilde{f}over~ start_ARG italic_f end_ARG be a feasible solution of Prob. (24). We construct Z~0:t=(f~(xτ))τ=0tsubscript~𝑍:0𝑡superscriptsubscript~𝑓subscript𝑥𝜏𝜏0𝑡\tilde{Z}_{0:t}=(\tilde{f}(x_{\tau}))_{\tau=0}^{t}over~ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT 0 : italic_t end_POSTSUBSCRIPT = ( over~ start_ARG italic_f end_ARG ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) ) start_POSTSUBSCRIPT italic_τ = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT and z~=f~(x)~𝑧~𝑓𝑥\tilde{z}=\tilde{f}(x)over~ start_ARG italic_z end_ARG = over~ start_ARG italic_f end_ARG ( italic_x ). Consider the minimum-norm interpolation problem,

    minsfsubscript𝑠subscript𝑓\displaystyle\min_{s\in\mathcal{B}_{f}}roman_min start_POSTSUBSCRIPT italic_s ∈ caligraphic_B start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_POSTSUBSCRIPT s2superscriptnorm𝑠2\displaystyle\;\;\|s\|^{2}∥ italic_s ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (97)
    subject to s(xτ)=z~τ,τ{0}[t],formulae-sequence𝑠subscript𝑥𝜏subscript~𝑧𝜏for-all𝜏0delimited-[]𝑡\displaystyle\;\;s(x_{\tau})=\tilde{z}_{\tau},\forall\tau\in\{0\}\cup[t],italic_s ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) = over~ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , ∀ italic_τ ∈ { 0 } ∪ [ italic_t ] ,
    s(x)=z~.𝑠𝑥~𝑧\displaystyle\;\;s(x)=\tilde{z}.italic_s ( italic_x ) = over~ start_ARG italic_z end_ARG .

    By representer theorem, the Prob. (97) admits an optimal solution with the form αk0:t,x()superscript𝛼topsubscript𝑘:0𝑡𝑥\alpha^{\top}k_{0:t,x}(\cdot)italic_α start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_k start_POSTSUBSCRIPT 0 : italic_t , italic_x end_POSTSUBSCRIPT ( ⋅ ), where k0:t,x:=(k(w,))w{x0,,xt,x}assignsubscript𝑘:0𝑡𝑥subscript𝑘𝑤𝑤subscript𝑥0subscript𝑥𝑡𝑥k_{0:t,x}\vcentcolon=(k(w,\cdot))_{w\in\{x_{0},\cdots,x_{t},x\}}italic_k start_POSTSUBSCRIPT 0 : italic_t , italic_x end_POSTSUBSCRIPT := ( italic_k ( italic_w , ⋅ ) ) start_POSTSUBSCRIPT italic_w ∈ { italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , ⋯ , italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_x } end_POSTSUBSCRIPT. So Prob. (97) can be reduced to

    minαt+2subscript𝛼superscript𝑡2\displaystyle\min_{\alpha\in\mathbb{R}^{t+2}}roman_min start_POSTSUBSCRIPT italic_α ∈ blackboard_R start_POSTSUPERSCRIPT italic_t + 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT αK0:t,xαsuperscript𝛼topsubscript𝐾:0𝑡𝑥𝛼\displaystyle\;\;\alpha^{\top}K_{0:t,x}\alphaitalic_α start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_K start_POSTSUBSCRIPT 0 : italic_t , italic_x end_POSTSUBSCRIPT italic_α (98)
    subject to K0:t,xα=[Z~0:tz~].subscript𝐾:0𝑡𝑥𝛼delimited-[]subscript~𝑍:0𝑡~𝑧\displaystyle\;\;K_{0:t,x}\alpha=\left[\begin{array}[]{l}\tilde{Z}_{0:t}\\ \tilde{z}\end{array}\right].italic_K start_POSTSUBSCRIPT 0 : italic_t , italic_x end_POSTSUBSCRIPT italic_α = [ start_ARRAY start_ROW start_CELL over~ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT 0 : italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL over~ start_ARG italic_z end_ARG end_CELL end_ROW end_ARRAY ] .

    Hence, by solving Prob. (98), we can derive the minimum norm square with interpolation constraints as

    [Z~0:tz~]K0:t,x1[Z~0:tz~].superscriptdelimited-[]subscript~𝑍:0𝑡~𝑧topsuperscriptsubscript𝐾:0𝑡𝑥1delimited-[]subscript~𝑍:0𝑡~𝑧\left[\begin{array}[]{l}\tilde{Z}_{0:t}\\ \tilde{z}\end{array}\right]^{\top}K_{0:t,x}^{-1}\left[\begin{array}[]{l}\tilde% {Z}_{0:t}\\ \tilde{z}\end{array}\right].[ start_ARRAY start_ROW start_CELL over~ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT 0 : italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL over~ start_ARG italic_z end_ARG end_CELL end_ROW end_ARRAY ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_K start_POSTSUBSCRIPT 0 : italic_t , italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT [ start_ARRAY start_ROW start_CELL over~ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT 0 : italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL over~ start_ARG italic_z end_ARG end_CELL end_ROW end_ARRAY ] .

    Since f~~𝑓\tilde{f}over~ start_ARG italic_f end_ARG itself is an interpolant by construction of (Z~0:t,z~)subscript~𝑍:0𝑡~𝑧(\tilde{Z}_{0:t},\tilde{z})( over~ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT 0 : italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_z end_ARG ). We have

    [Z~0:tz~]K0:t,x1[Z~0:tz~]f~2B2.superscriptdelimited-[]subscript~𝑍:0𝑡~𝑧topsuperscriptsubscript𝐾:0𝑡𝑥1delimited-[]subscript~𝑍:0𝑡~𝑧superscriptnorm~𝑓2superscript𝐵2\left[\begin{array}[]{l}\tilde{Z}_{0:t}\\ \tilde{z}\end{array}\right]^{\top}K_{0:t,x}^{-1}\left[\begin{array}[]{l}\tilde% {Z}_{0:t}\\ \tilde{z}\end{array}\right]\leq\|\tilde{f}\|^{2}\leq B^{2}.[ start_ARRAY start_ROW start_CELL over~ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT 0 : italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL over~ start_ARG italic_z end_ARG end_CELL end_ROW end_ARRAY ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_K start_POSTSUBSCRIPT 0 : italic_t , italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT [ start_ARRAY start_ROW start_CELL over~ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT 0 : italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL over~ start_ARG italic_z end_ARG end_CELL end_ROW end_ARRAY ] ≤ ∥ over~ start_ARG italic_f end_ARG ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

    And since the log-likelihood only depends on Z~0:tsubscript~𝑍:0𝑡\tilde{Z}_{0:t}over~ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT 0 : italic_t end_POSTSUBSCRIPT, it holds that

    (Z~0:t|𝒟t)=t(f~)t(f^tMLE)β1(ϵ,δ,t).conditionalsubscript~𝑍:0𝑡subscript𝒟𝑡subscript𝑡~𝑓subscript𝑡subscriptsuperscript^𝑓MLE𝑡subscript𝛽1italic-ϵ𝛿𝑡\ell(\tilde{Z}_{0:t}|\mathcal{D}_{t})=\ell_{t}(\tilde{f})\geq\ell_{t}(\hat{f}^% {\mathrm{MLE}}_{t})-\beta_{1}(\epsilon,\delta,t).roman_ℓ ( over~ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT 0 : italic_t end_POSTSUBSCRIPT | caligraphic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over~ start_ARG italic_f end_ARG ) ≥ roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT roman_MLE end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_ϵ , italic_δ , italic_t ) .

    And the objectives satisfy,

    z~z~t=f~(x)f~(xt).~𝑧subscript~𝑧𝑡~𝑓𝑥~𝑓subscript𝑥𝑡\tilde{z}-\tilde{z}_{t}=\tilde{f}(x)-\tilde{f}(x_{t}).over~ start_ARG italic_z end_ARG - over~ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = over~ start_ARG italic_f end_ARG ( italic_x ) - over~ start_ARG italic_f end_ARG ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) .

    Therefore, (Z~0:t,z~)subscript~𝑍:0𝑡~𝑧(\tilde{Z}_{0:t},\tilde{z})( over~ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT 0 : italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_z end_ARG ) is a feasible solution for Prob. (25) with the same objective as f~~𝑓\tilde{f}over~ start_ARG italic_f end_ARG for Prob. (24).

  2. 2.

    We then show that for any feasible solution of Prob. (25), we can find a corresponding feasible solution of Prob. (24) with the same objective value. Let (Z0:t,z)subscript𝑍:0𝑡𝑧(Z_{0:t},z)( italic_Z start_POSTSUBSCRIPT 0 : italic_t end_POSTSUBSCRIPT , italic_z ) be a feasible solution of Prob. (25). We construct

    f~z=[Z0:tz]K0:t,x1k0:t,x().subscript~𝑓𝑧superscriptdelimited-[]subscript𝑍:0𝑡𝑧topsuperscriptsubscript𝐾:0𝑡𝑥1subscript𝑘:0𝑡𝑥\tilde{f}_{z}=\left[\begin{array}[]{l}{Z}_{0:t}\\ {z}\end{array}\right]^{\top}K_{0:t,x}^{-1}k_{0:t,x}(\cdot).over~ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT = [ start_ARRAY start_ROW start_CELL italic_Z start_POSTSUBSCRIPT 0 : italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_z end_CELL end_ROW end_ARRAY ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_K start_POSTSUBSCRIPT 0 : italic_t , italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_k start_POSTSUBSCRIPT 0 : italic_t , italic_x end_POSTSUBSCRIPT ( ⋅ ) .

    Hence,

    f~z2=[Z0:tz]K0:t,x1[Z0:tz]B2.superscriptnormsubscript~𝑓𝑧2superscriptdelimited-[]subscript𝑍:0𝑡𝑧topsuperscriptsubscript𝐾:0𝑡𝑥1delimited-[]subscript𝑍:0𝑡𝑧superscript𝐵2\|\tilde{f}_{z}\|^{2}=\left[\begin{array}[]{l}{Z}_{0:t}\\ {z}\end{array}\right]^{\top}K_{0:t,x}^{-1}\left[\begin{array}[]{l}{Z}_{0:t}\\ {z}\end{array}\right]\leq B^{2}.∥ over~ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = [ start_ARRAY start_ROW start_CELL italic_Z start_POSTSUBSCRIPT 0 : italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_z end_CELL end_ROW end_ARRAY ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_K start_POSTSUBSCRIPT 0 : italic_t , italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT [ start_ARRAY start_ROW start_CELL italic_Z start_POSTSUBSCRIPT 0 : italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_z end_CELL end_ROW end_ARRAY ] ≤ italic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

    And it can be checked that f~z(xτ)=zτ,τ{0}[t]formulae-sequencesubscript~𝑓𝑧subscript𝑥𝜏subscript𝑧𝜏for-all𝜏0delimited-[]𝑡\tilde{f}_{z}(x_{\tau})=z_{\tau},\forall\tau\in\{0\}\cup[t]over~ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) = italic_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , ∀ italic_τ ∈ { 0 } ∪ [ italic_t ] and f~z(x)=zsubscript~𝑓𝑧𝑥𝑧\tilde{f}_{z}(x)=zover~ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( italic_x ) = italic_z. So t(f~z)=(Z0:t|𝒟t)t(f^tMLE)β1(ϵ,δ,t)subscript𝑡subscript~𝑓𝑧conditionalsubscript𝑍:0𝑡subscript𝒟𝑡subscript𝑡subscriptsuperscript^𝑓MLE𝑡subscript𝛽1italic-ϵ𝛿𝑡\ell_{t}(\tilde{f}_{z})=\ell(Z_{0:t}|\mathcal{D}_{t})\geq\ell_{t}(\hat{f}^{% \mathrm{MLE}}_{t})-\beta_{1}(\epsilon,\delta,t)roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over~ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ) = roman_ℓ ( italic_Z start_POSTSUBSCRIPT 0 : italic_t end_POSTSUBSCRIPT | caligraphic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ≥ roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT roman_MLE end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_ϵ , italic_δ , italic_t ). And the objectives satisfy f~z(x)f~z(xt)=zztsubscript~𝑓𝑧𝑥subscript~𝑓𝑧subscript𝑥𝑡𝑧subscript𝑧𝑡\tilde{f}_{z}(x)-\tilde{f}_{z}(x_{t})=z-z_{t}over~ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( italic_x ) - over~ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = italic_z - italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. So it is proved that for any feasible solution of Prob. (25), we can find a corresponding feasible solution of Prob. (24) with the same objective value.

The desired result then follows.

Appendix G Elaboration on Remark 2.3

By assumption 2.22.22.22.2, we assume that there exists a large enough constant B𝐵Bitalic_B that upper bounds the norm of the ground-truth black-box function f𝑓fitalic_f. However, the exact value of this upper bound may be unknown to us in practice, while the execution of our algorithm relies on the knowledge of B𝐵Bitalic_B (in Problem (23), B𝐵Bitalic_B is a key parameter). So we need to guess the value of B𝐵Bitalic_B. Suppose our guess is B^^𝐵\hat{B}over^ start_ARG italic_B end_ARG. It is possible that B^^𝐵\hat{B}over^ start_ARG italic_B end_ARG is even smaller than the ground-truth function norm fnorm𝑓\|f\|∥ italic_f ∥. To detect this wrong guess, we observe that, with the correct setting of B𝐵Bitalic_B such that Bf𝐵norm𝑓B\geq\|f\|italic_B ≥ ∥ italic_f ∥, we have that by Thm. 3.1 and the definition of maximum likelihood estimate, with high probability,

t(f^t|BMLE)t(f)t(f^t|BMLE)β1(ϵ,δ,t|B),subscript𝑡subscriptsuperscript^𝑓MLEconditional𝑡𝐵subscript𝑡𝑓subscript𝑡subscriptsuperscript^𝑓MLEconditional𝑡𝐵subscript𝛽1italic-ϵ𝛿conditional𝑡𝐵\ell_{t}(\hat{f}^{\mathrm{MLE}}_{t|B})\geq\ell_{t}(f)\geq\ell_{t}(\hat{f}^{% \mathrm{MLE}}_{t|B})-\beta_{1}(\epsilon,\delta,t|B),roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT roman_MLE end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t | italic_B end_POSTSUBSCRIPT ) ≥ roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_f ) ≥ roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT roman_MLE end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t | italic_B end_POSTSUBSCRIPT ) - italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_ϵ , italic_δ , italic_t | italic_B ) ,

where f^t|BMLEsubscriptsuperscript^𝑓MLEconditional𝑡𝐵\hat{f}^{\mathrm{MLE}}_{t|B}over^ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT roman_MLE end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t | italic_B end_POSTSUBSCRIPT is the maximum likelihood estimate function with function norm bound B𝐵Bitalic_B and β1(ϵ,δ,t|B)subscript𝛽1italic-ϵ𝛿conditional𝑡𝐵\beta_{1}(\epsilon,\delta,t|B)italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_ϵ , italic_δ , italic_t | italic_B ) is the corresponding parameter as defined in Thm. 3.1 with norm bound B𝐵Bitalic_B. We also have 2B2𝐵2B2 italic_B is a valid upper bound on fnorm𝑓\|f\|∥ italic_f ∥ and thus,

t(f^t|2BMLE)t(f)t(f^t|2BMLE)β1(ϵ,δ,t|2B).subscript𝑡subscriptsuperscript^𝑓MLEconditional𝑡2𝐵subscript𝑡𝑓subscript𝑡subscriptsuperscript^𝑓MLEconditional𝑡2𝐵subscript𝛽1italic-ϵ𝛿conditional𝑡2𝐵\ell_{t}(\hat{f}^{\mathrm{MLE}}_{t|2B})\geq\ell_{t}(f)\geq\ell_{t}(\hat{f}^{% \mathrm{MLE}}_{t|2B})-\beta_{1}(\epsilon,\delta,t|2B).roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT roman_MLE end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t | 2 italic_B end_POSTSUBSCRIPT ) ≥ roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_f ) ≥ roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT roman_MLE end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t | 2 italic_B end_POSTSUBSCRIPT ) - italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_ϵ , italic_δ , italic_t | 2 italic_B ) .

Hence,

t(f^t|BMLE)t(f)t(f^t|2BMLE)β1(ϵ,δ,t|2B).subscript𝑡subscriptsuperscript^𝑓MLEconditional𝑡𝐵subscript𝑡𝑓subscript𝑡subscriptsuperscript^𝑓MLEconditional𝑡2𝐵subscript𝛽1italic-ϵ𝛿conditional𝑡2𝐵\ell_{t}(\hat{f}^{\mathrm{MLE}}_{t|B})\geq\ell_{t}(f)\geq\ell_{t}(\hat{f}^{% \mathrm{MLE}}_{t|2B})-\beta_{1}(\epsilon,\delta,t|2B).roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT roman_MLE end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t | italic_B end_POSTSUBSCRIPT ) ≥ roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_f ) ≥ roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT roman_MLE end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t | 2 italic_B end_POSTSUBSCRIPT ) - italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_ϵ , italic_δ , italic_t | 2 italic_B ) .

That is to say, t(f^t|BMLE)subscript𝑡subscriptsuperscript^𝑓MLEconditional𝑡𝐵\ell_{t}(\hat{f}^{\mathrm{MLE}}_{t|B})roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT roman_MLE end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t | italic_B end_POSTSUBSCRIPT ) needs to be greater than or equal to t(f^t|2BMLE)β1(ϵ,δ,t|2B)subscript𝑡subscriptsuperscript^𝑓MLEconditional𝑡2𝐵subscript𝛽1italic-ϵ𝛿conditional𝑡2𝐵\ell_{t}(\hat{f}^{\mathrm{MLE}}_{t|2B})-\beta_{1}(\epsilon,\delta,t|2B)roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT roman_MLE end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t | 2 italic_B end_POSTSUBSCRIPT ) - italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_ϵ , italic_δ , italic_t | 2 italic_B ) when B𝐵Bitalic_B is a valid upper bound on fnorm𝑓\|f\|∥ italic_f ∥.

Therefore, we can use the heuristic: every time we find that

t(f^t|B^MLE)<t(f^t|2B^MLE)β1(ϵ,δ,t|2B^),subscript𝑡subscriptsuperscript^𝑓MLEconditional𝑡^𝐵subscript𝑡subscriptsuperscript^𝑓MLEconditional𝑡2^𝐵subscript𝛽1italic-ϵ𝛿conditional𝑡2^𝐵\ell_{t}(\hat{f}^{\mathrm{MLE}}_{t|\hat{B}})<\ell_{t}(\hat{f}^{\mathrm{MLE}}_{% t|2\hat{B}})-\beta_{1}(\epsilon,\delta,t|2\hat{B}),roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT roman_MLE end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t | over^ start_ARG italic_B end_ARG end_POSTSUBSCRIPT ) < roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT roman_MLE end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t | 2 over^ start_ARG italic_B end_ARG end_POSTSUBSCRIPT ) - italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_ϵ , italic_δ , italic_t | 2 over^ start_ARG italic_B end_ARG ) ,

we double the upper bound guess B^^𝐵\hat{B}over^ start_ARG italic_B end_ARG.

Appendix H Jointly optimize x𝑥xitalic_x, Z0,tsubscript𝑍0𝑡Z_{0,t}italic_Z start_POSTSUBSCRIPT 0 , italic_t end_POSTSUBSCRIPT and z𝑧zitalic_z for the problem (25).

For medium-dimensional problems (d>4𝑑4d>4italic_d > 4), we can jointly optimize x𝑥xitalic_x, Z0:tsubscript𝑍:0𝑡Z_{0:t}italic_Z start_POSTSUBSCRIPT 0 : italic_t end_POSTSUBSCRIPT, and z𝑧zitalic_z by a nonlinear programming solver from multiple random initial conditions. That is, we can also treat x𝑥xitalic_x in the problem (23) as an optimization variable. In this way, we lose convexity but only need to solve the problem (23) for only once in each step t𝑡titalic_t.

More specifically, we solve the optimization problem (99).

maxxd,Z0:tt+1,zsubscriptformulae-sequence𝑥superscript𝑑formulae-sequencesubscript𝑍:0𝑡superscript𝑡1𝑧\displaystyle\max_{x\in\mathbb{R}^{d},Z_{0:t}\in\mathbb{R}^{t+1},z\in\mathbb{R}}roman_max start_POSTSUBSCRIPT italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , italic_Z start_POSTSUBSCRIPT 0 : italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT , italic_z ∈ blackboard_R end_POSTSUBSCRIPT zzt𝑧subscript𝑧𝑡\displaystyle\quad z-z_{t}italic_z - italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT (99)
subject to [Z0:tz]K0:t,x1[Z0:tz]B2,superscriptdelimited-[]subscript𝑍:0𝑡𝑧topsuperscriptsubscript𝐾:0𝑡𝑥1delimited-[]subscript𝑍:0𝑡𝑧superscript𝐵2\displaystyle\quad\left[\begin{array}[]{l}Z_{0:t}\\ z\end{array}\right]^{\top}K_{0:t,x}^{-1}\left[\begin{array}[]{l}Z_{0:t}\\ z\end{array}\right]\leq B^{2},[ start_ARRAY start_ROW start_CELL italic_Z start_POSTSUBSCRIPT 0 : italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_z end_CELL end_ROW end_ARRAY ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_K start_POSTSUBSCRIPT 0 : italic_t , italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT [ start_ARRAY start_ROW start_CELL italic_Z start_POSTSUBSCRIPT 0 : italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_z end_CELL end_ROW end_ARRAY ] ≤ italic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,
(Z0:t|𝒟t)t(f^tMLE)β1(ϵ,δ,t),conditionalsubscript𝑍:0𝑡subscript𝒟𝑡subscript𝑡subscriptsuperscript^𝑓MLE𝑡subscript𝛽1italic-ϵ𝛿𝑡\displaystyle\quad\ell(Z_{0:t}|\mathcal{D}_{t})\geq\ell_{t}(\hat{f}^{\mathrm{% MLE}}_{t})-\beta_{1}(\epsilon,\delta,t),roman_ℓ ( italic_Z start_POSTSUBSCRIPT 0 : italic_t end_POSTSUBSCRIPT | caligraphic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ≥ roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT roman_MLE end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_ϵ , italic_δ , italic_t ) ,

The only constraint that involves x𝑥xitalic_x is

[Z0:tz]K0:t,x1[Z0:tz]B2.superscriptdelimited-[]subscript𝑍:0𝑡𝑧topsuperscriptsubscript𝐾:0𝑡𝑥1delimited-[]subscript𝑍:0𝑡𝑧superscript𝐵2\quad\left[\begin{array}[]{l}Z_{0:t}\\ z\end{array}\right]^{\top}K_{0:t,x}^{-1}\left[\begin{array}[]{l}Z_{0:t}\\ z\end{array}\right]\leq B^{2}.[ start_ARRAY start_ROW start_CELL italic_Z start_POSTSUBSCRIPT 0 : italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_z end_CELL end_ROW end_ARRAY ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_K start_POSTSUBSCRIPT 0 : italic_t , italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT [ start_ARRAY start_ROW start_CELL italic_Z start_POSTSUBSCRIPT 0 : italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_z end_CELL end_ROW end_ARRAY ] ≤ italic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (100)

Applying matrix inversion, we derive that the left-hand side is equal to,

Z0:tK0:t1Z0:t+1k(x,x)kt(x)K0:t1kt(x)[Z0:tz][K0:t1kt(x)1][K0:t1kt(x)1][Z0:tz],superscriptsubscript𝑍:0𝑡topsuperscriptsubscript𝐾:0𝑡1subscript𝑍:0𝑡1𝑘𝑥𝑥subscript𝑘𝑡superscript𝑥topsuperscriptsubscript𝐾:0𝑡1subscript𝑘𝑡𝑥superscriptdelimited-[]subscript𝑍:0𝑡𝑧topdelimited-[]superscriptsubscript𝐾:0𝑡1subscript𝑘𝑡𝑥1superscriptdelimited-[]superscriptsubscript𝐾:0𝑡1subscript𝑘𝑡𝑥1topdelimited-[]subscript𝑍:0𝑡𝑧Z_{0:t}^{\top}K_{0:t}^{-1}Z_{0:t}+\frac{1}{k(x,x)-k_{t}(x)^{\top}K_{0:t}^{-1}k% _{t}(x)}\quad\left[\begin{array}[]{l}Z_{0:t}\\ z\end{array}\right]^{\top}\left[\begin{array}[]{l}K_{0:t}^{-1}k_{t}(x)\\ -1\end{array}\right]\left[\begin{array}[]{l}K_{0:t}^{-1}k_{t}(x)\\ -1\end{array}\right]^{\top}\left[\begin{array}[]{l}Z_{0:t}\\ z\end{array}\right],italic_Z start_POSTSUBSCRIPT 0 : italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_K start_POSTSUBSCRIPT 0 : italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_Z start_POSTSUBSCRIPT 0 : italic_t end_POSTSUBSCRIPT + divide start_ARG 1 end_ARG start_ARG italic_k ( italic_x , italic_x ) - italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_K start_POSTSUBSCRIPT 0 : italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) end_ARG [ start_ARRAY start_ROW start_CELL italic_Z start_POSTSUBSCRIPT 0 : italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_z end_CELL end_ROW end_ARRAY ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT [ start_ARRAY start_ROW start_CELL italic_K start_POSTSUBSCRIPT 0 : italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) end_CELL end_ROW start_ROW start_CELL - 1 end_CELL end_ROW end_ARRAY ] [ start_ARRAY start_ROW start_CELL italic_K start_POSTSUBSCRIPT 0 : italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) end_CELL end_ROW start_ROW start_CELL - 1 end_CELL end_ROW end_ARRAY ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT [ start_ARRAY start_ROW start_CELL italic_Z start_POSTSUBSCRIPT 0 : italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_z end_CELL end_ROW end_ARRAY ] , (101)

where kt(x):=(k(xτ,x))τ=0tassignsubscript𝑘𝑡𝑥superscriptsubscript𝑘subscript𝑥𝜏𝑥𝜏0𝑡k_{t}(x)\vcentcolon=(k(x_{\tau},x))_{\tau=0}^{t}italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) := ( italic_k ( italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_x ) ) start_POSTSUBSCRIPT italic_τ = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT.

We can then apply a nonlinear programming solver such as Ipopt to solve the problem (99) from randomly sampled initial points. Then the best converged solution is set to be the next sample point xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT.

Appendix I Extension to the multiple-choice setting

In this paper, we mainly consider the setting where human expresses preference over only two choices, because of its low cognitive burden to the human user and simplicity of theoretical analysis. However, we can extend POP-BO to the multiple-choice setting where human can compare multiple choices and express the favorite one.

Suppose that in each step τ𝜏\tauitalic_τ, we aim to generate a batch of q𝑞qitalic_q points. Then we can mix the new batch with the old batch generated in step τ1𝜏1\tau-1italic_τ - 1, and query the comparison oracle to report the favorite point among the 2q2𝑞2q2 italic_q points.

Firstly, the confidence set of functions can be similarly constructed using the likelihood ratio idea and the multiple-choice probabilistic preference model as in (Astudilo et al. 2023),

(xr is the favorite)=ef(xr)x{last batch and the new batch}ef(x).subscript𝑥𝑟 is the favoritesuperscript𝑒𝑓subscript𝑥𝑟subscript𝑥last batch and the new batchsuperscript𝑒𝑓𝑥\mathbb{P}\left(x_{r}\text{ is the favorite}\right)=\frac{e^{f(x_{r})}}{\sum_{% x\in\{\text{last batch and the new batch}\}}e^{f(x)}}.blackboard_P ( italic_x start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT is the favorite ) = divide start_ARG italic_e start_POSTSUPERSCRIPT italic_f ( italic_x start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_x ∈ { last batch and the new batch } end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT italic_f ( italic_x ) end_POSTSUPERSCRIPT end_ARG . (102)

Secondly, to generate the new batch, the basic idea is that we can apply a ‘bootstrap’-type technique. More specifically, we can sequentially generate the new batch x1,x2,,xqsuperscript𝑥1superscript𝑥2superscript𝑥𝑞x^{1},x^{2},\cdots,x^{q}italic_x start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , ⋯ , italic_x start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT. When generating the new point xr+1superscript𝑥𝑟1x^{r+1}italic_x start_POSTSUPERSCRIPT italic_r + 1 end_POSTSUPERSCRIPT, we maximize its corresponding optimistic advantage of zr+1superscript𝑧𝑟1z^{r+1}italic_z start_POSTSUPERSCRIPT italic_r + 1 end_POSTSUPERSCRIPT as compared to the maximum of ztq+1:t,z1,,zrsubscript𝑧:𝑡𝑞1𝑡superscript𝑧1superscript𝑧𝑟z_{t-q+1:t},z^{1},\cdots,z^{r}italic_z start_POSTSUBSCRIPT italic_t - italic_q + 1 : italic_t end_POSTSUBSCRIPT , italic_z start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , ⋯ , italic_z start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT by solving a similar problem to Problem (23). That is, we solve the Problem (103) to generate the new point xr+1superscript𝑥𝑟1x^{r+1}italic_x start_POSTSUPERSCRIPT italic_r + 1 end_POSTSUPERSCRIPT in the same batch,

maxxd,z,z1:rr,Z0:tt+1subscriptformulae-sequence𝑥superscript𝑑formulae-sequence𝑧formulae-sequencesuperscript𝑧:1𝑟superscript𝑟subscript𝑍:0𝑡superscript𝑡1\displaystyle\max_{x\in\mathbb{R}^{d},z\in\mathbb{R},z^{1:r}\in\mathbb{R}^{r},% Z_{0:t}\in\mathbb{R}^{t+1}}roman_max start_POSTSUBSCRIPT italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , italic_z ∈ blackboard_R , italic_z start_POSTSUPERSCRIPT 1 : italic_r end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT , italic_Z start_POSTSUBSCRIPT 0 : italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT zmax{ztq+1,,zt,z1,,zr}𝑧subscript𝑧𝑡𝑞1subscript𝑧𝑡superscript𝑧1superscript𝑧𝑟\displaystyle\quad z-\max\{z_{t-q+1},\cdots,z_{t},z^{1},\cdots,z^{r}\}italic_z - roman_max { italic_z start_POSTSUBSCRIPT italic_t - italic_q + 1 end_POSTSUBSCRIPT , ⋯ , italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_z start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , ⋯ , italic_z start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT } (103)
subject to [Z0:tz1:rz]K0:t,x1:r,x1[Z0:tz1:rz]B2,superscriptdelimited-[]subscript𝑍:0𝑡superscript𝑧:1𝑟𝑧topsuperscriptsubscript𝐾:0𝑡superscript𝑥:1𝑟𝑥1delimited-[]subscript𝑍:0𝑡superscript𝑧:1𝑟𝑧superscript𝐵2\displaystyle\quad\left[\begin{array}[]{l}Z_{0:t}\\ z^{1:r}\\ z\end{array}\right]^{\top}K_{0:t,x^{1:r},x}^{-1}\left[\begin{array}[]{l}Z_{0:t% }\\ z^{1:r}\\ z\end{array}\right]\leq B^{2},[ start_ARRAY start_ROW start_CELL italic_Z start_POSTSUBSCRIPT 0 : italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_z start_POSTSUPERSCRIPT 1 : italic_r end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL italic_z end_CELL end_ROW end_ARRAY ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_K start_POSTSUBSCRIPT 0 : italic_t , italic_x start_POSTSUPERSCRIPT 1 : italic_r end_POSTSUPERSCRIPT , italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT [ start_ARRAY start_ROW start_CELL italic_Z start_POSTSUBSCRIPT 0 : italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_z start_POSTSUPERSCRIPT 1 : italic_r end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL italic_z end_CELL end_ROW end_ARRAY ] ≤ italic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,
(Z0:t|𝒟t)t(f^tMLE)βt,conditionalsubscript𝑍:0𝑡subscript𝒟𝑡subscript𝑡subscriptsuperscript^𝑓MLE𝑡subscript𝛽𝑡\displaystyle\quad\ell(Z_{0:t}|\mathcal{D}_{t})\geq\ell_{t}(\hat{f}^{\mathrm{% MLE}}_{t})-\beta_{t},roman_ℓ ( italic_Z start_POSTSUBSCRIPT 0 : italic_t end_POSTSUBSCRIPT | caligraphic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ≥ roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT roman_MLE end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ,

which is equivalent to

maxxd,z,v,z1:rr,Z0:tt+1subscriptformulae-sequence𝑥superscript𝑑formulae-sequence𝑧formulae-sequence𝑣formulae-sequencesuperscript𝑧:1𝑟superscript𝑟subscript𝑍:0𝑡superscript𝑡1\displaystyle\max_{x\in\mathbb{R}^{d},z\in\mathbb{R},v\in\mathbb{R},z^{1:r}\in% \mathbb{R}^{r},Z_{0:t}\in\mathbb{R}^{t+1}}roman_max start_POSTSUBSCRIPT italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , italic_z ∈ blackboard_R , italic_v ∈ blackboard_R , italic_z start_POSTSUPERSCRIPT 1 : italic_r end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT , italic_Z start_POSTSUBSCRIPT 0 : italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT zv𝑧𝑣\displaystyle\quad z-vitalic_z - italic_v (104)
subject to [Z0:tz1:rz]K0:t,x1:r,x1[Z0:tz1:rz]B2,superscriptdelimited-[]subscript𝑍:0𝑡superscript𝑧:1𝑟𝑧topsuperscriptsubscript𝐾:0𝑡superscript𝑥:1𝑟𝑥1delimited-[]subscript𝑍:0𝑡superscript𝑧:1𝑟𝑧superscript𝐵2\displaystyle\quad\left[\begin{array}[]{l}Z_{0:t}\\ z^{1:r}\\ z\end{array}\right]^{\top}K_{0:t,x^{1:r},x}^{-1}\left[\begin{array}[]{l}Z_{0:t% }\\ z^{1:r}\\ z\end{array}\right]\leq B^{2},[ start_ARRAY start_ROW start_CELL italic_Z start_POSTSUBSCRIPT 0 : italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_z start_POSTSUPERSCRIPT 1 : italic_r end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL italic_z end_CELL end_ROW end_ARRAY ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_K start_POSTSUBSCRIPT 0 : italic_t , italic_x start_POSTSUPERSCRIPT 1 : italic_r end_POSTSUPERSCRIPT , italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT [ start_ARRAY start_ROW start_CELL italic_Z start_POSTSUBSCRIPT 0 : italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_z start_POSTSUPERSCRIPT 1 : italic_r end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL italic_z end_CELL end_ROW end_ARRAY ] ≤ italic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,
(Z0:t|𝒟t)t(f^tMLE)βt,conditionalsubscript𝑍:0𝑡subscript𝒟𝑡subscript𝑡subscriptsuperscript^𝑓MLE𝑡subscript𝛽𝑡\displaystyle\quad\ell(Z_{0:t}|\mathcal{D}_{t})\geq\ell_{t}(\hat{f}^{\mathrm{% MLE}}_{t})-\beta_{t},roman_ℓ ( italic_Z start_POSTSUBSCRIPT 0 : italic_t end_POSTSUBSCRIPT | caligraphic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ≥ roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT roman_MLE end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ,
vzti+1,i[q],formulae-sequence𝑣subscript𝑧𝑡𝑖1𝑖delimited-[]𝑞\displaystyle\quad v\geq z_{t-i+1},i\in[q],italic_v ≥ italic_z start_POSTSUBSCRIPT italic_t - italic_i + 1 end_POSTSUBSCRIPT , italic_i ∈ [ italic_q ] ,
vzj,j[r],formulae-sequence𝑣superscript𝑧𝑗𝑗delimited-[]𝑟\displaystyle\quad v\geq z^{j},j\in[r],italic_v ≥ italic_z start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , italic_j ∈ [ italic_r ] ,

by introducing an auxiliary variable v𝑣v\in\mathbb{R}italic_v ∈ blackboard_R. Problem (104) can be efficiently solved by the nonlinear programming solver Ipopt.

Appendix J Proof of Thm. 5.2

To prepare for the following analysis, we first give a useful lemma.

Lemma J.1 (Lemma 4, (Chowdhury & Gopalan, 2017b)).
t=1Tσtff((xt,xt))4(T+2)γTff,superscriptsubscript𝑡1𝑇subscriptsuperscript𝜎𝑓superscript𝑓𝑡subscript𝑥𝑡superscriptsubscript𝑥𝑡4𝑇2subscriptsuperscript𝛾𝑓superscript𝑓𝑇\sum_{t=1}^{T}{\sigma}^{ff^{\prime}}_{t}\left((x_{t},x_{t}^{\prime})\right)% \leq\sqrt{4(T+2){\gamma}^{ff^{\prime}}_{T}},∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT italic_f italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) ≤ square-root start_ARG 4 ( italic_T + 2 ) italic_γ start_POSTSUPERSCRIPT italic_f italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG , (105)

where σtffsubscriptsuperscript𝜎𝑓superscript𝑓𝑡\sigma^{ff^{\prime}}_{t}italic_σ start_POSTSUPERSCRIPT italic_f italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is as defined in Eq. (17) and γTffsubscriptsuperscript𝛾𝑓superscript𝑓𝑇{\gamma}^{ff^{\prime}}_{T}italic_γ start_POSTSUPERSCRIPT italic_f italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT is as defined in Eq. (19).

Proof.

Apply the Lemma 4 in (Chowdhury & Gopalan, 2017b) by setting the kernel function as kffsuperscript𝑘𝑓superscript𝑓k^{ff^{\prime}}italic_k start_POSTSUPERSCRIPT italic_f italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT. ∎

For convenience, we use βtsubscript𝛽𝑡\beta_{t}italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT to denote β(ϵ,δ/2,t)𝛽italic-ϵ𝛿2𝑡\beta(\epsilon,\nicefrac{{\delta}}{{2}},t)italic_β ( italic_ϵ , / start_ARG italic_δ end_ARG start_ARG 2 end_ARG , italic_t ). We can then analyze the regret of the optimistic algorithm.

RT=subscript𝑅𝑇absent\displaystyle R_{T}=italic_R start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = t=1T[f(x)f(xt)]superscriptsubscript𝑡1𝑇delimited-[]𝑓superscript𝑥𝑓subscript𝑥𝑡\displaystyle\sum_{t=1}^{T}[f(x^{\star})-f(x_{t})]∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT [ italic_f ( italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) - italic_f ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ]
=\displaystyle== t=1T[(f(x)f(xt))(f(xt)f(xt))]superscriptsubscript𝑡1𝑇delimited-[]𝑓superscript𝑥𝑓superscriptsubscript𝑥𝑡𝑓subscript𝑥𝑡𝑓superscriptsubscript𝑥𝑡\displaystyle\sum_{t=1}^{T}[(f(x^{\star})-f(x_{t}^{\prime}))-(f(x_{t})-f(x_{t}% ^{\prime}))]∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT [ ( italic_f ( italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) - italic_f ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) - ( italic_f ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_f ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) ]
\displaystyle\leq t=1T[(f~t(xt)f~t(xt))(f(xt)f(xt))]superscriptsubscript𝑡1𝑇delimited-[]subscript~𝑓𝑡subscript𝑥𝑡subscript~𝑓𝑡superscriptsubscript𝑥𝑡𝑓subscript𝑥𝑡𝑓superscriptsubscript𝑥𝑡\displaystyle\sum_{t=1}^{T}[(\tilde{f}_{t}(x_{t})-\tilde{f}_{t}(x_{t}^{\prime}% ))-(f(x_{t})-f(x_{t}^{\prime}))]∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT [ ( over~ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - over~ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) - ( italic_f ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_f ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) ]
\displaystyle\leq t=1T2(2B+λ1/2βt1/2)σtff((xt,xt)),superscriptsubscript𝑡1𝑇22𝐵superscript𝜆12superscriptsubscript𝛽𝑡12subscriptsuperscript𝜎𝑓superscript𝑓𝑡subscript𝑥𝑡superscriptsubscript𝑥𝑡\displaystyle\sum_{t=1}^{T}2(2B+\lambda^{-\nicefrac{{1}}{{2}}}\beta_{t}^{% \nicefrac{{1}}{{2}}})\sigma^{ff^{\prime}}_{t}((x_{t},x_{t}^{\prime})),∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT 2 ( 2 italic_B + italic_λ start_POSTSUPERSCRIPT - / start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT / start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ) italic_σ start_POSTSUPERSCRIPT italic_f italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) ,

where the first inequality follows by the optimality of (xt,f~t)subscript𝑥𝑡subscript~𝑓𝑡(x_{t},\tilde{f}_{t})( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) for the optimization problem in line 4 of the Alg. 1, and the second inequality follows by Thm. 3.6 (Note that β(ϵ,δ/2,t1)βt=β(ϵ,δ/2,t)𝛽italic-ϵ𝛿2𝑡1subscript𝛽𝑡𝛽italic-ϵ𝛿2𝑡\beta(\epsilon,\nicefrac{{\delta}}{{2}},t-1)\leq\beta_{t}=\beta(\epsilon,% \nicefrac{{\delta}}{{2}},t)italic_β ( italic_ϵ , / start_ARG italic_δ end_ARG start_ARG 2 end_ARG , italic_t - 1 ) ≤ italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_β ( italic_ϵ , / start_ARG italic_δ end_ARG start_ARG 2 end_ARG , italic_t )). Hence,

RTsubscript𝑅𝑇absent\displaystyle R_{T}\leqitalic_R start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ≤ t=1T2(2B+λ1/2βt1/2)σtff((xt,xt))superscriptsubscript𝑡1𝑇22𝐵superscript𝜆12superscriptsubscript𝛽𝑡12subscriptsuperscript𝜎𝑓superscript𝑓𝑡subscript𝑥𝑡superscriptsubscript𝑥𝑡\displaystyle\sum_{t=1}^{T}2(2B+\lambda^{-\nicefrac{{1}}{{2}}}\beta_{t}^{% \nicefrac{{1}}{{2}}})\sigma^{ff^{\prime}}_{t}((x_{t},x_{t}^{\prime}))∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT 2 ( 2 italic_B + italic_λ start_POSTSUPERSCRIPT - / start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT / start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ) italic_σ start_POSTSUPERSCRIPT italic_f italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) )
\displaystyle\leq 2(2B+λ1/2βT1/2)t=1Tσtff((xt,xt))22𝐵superscript𝜆12superscriptsubscript𝛽𝑇12superscriptsubscript𝑡1𝑇subscriptsuperscript𝜎𝑓superscript𝑓𝑡subscript𝑥𝑡superscriptsubscript𝑥𝑡\displaystyle 2(2B+\lambda^{-\nicefrac{{1}}{{2}}}\beta_{T}^{\nicefrac{{1}}{{2}% }})\sum_{t=1}^{T}\sigma^{ff^{\prime}}_{t}((x_{t},x_{t}^{\prime}))2 ( 2 italic_B + italic_λ start_POSTSUPERSCRIPT - / start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT / start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ) ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT italic_f italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) )
\displaystyle\leq 2(2B+λ1/2βT1/2)4(T+2)γTff22𝐵superscript𝜆12superscriptsubscript𝛽𝑇124𝑇2superscriptsubscript𝛾𝑇𝑓superscript𝑓\displaystyle 2(2B+\lambda^{-\nicefrac{{1}}{{2}}}\beta_{T}^{\nicefrac{{1}}{{2}% }})\sqrt{4(T+2)\gamma_{T}^{ff^{\prime}}}2 ( 2 italic_B + italic_λ start_POSTSUPERSCRIPT - / start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT / start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ) square-root start_ARG 4 ( italic_T + 2 ) italic_γ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_f italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT end_ARG
=\displaystyle== 𝒪(βTTγTff).𝒪subscript𝛽𝑇𝑇subscriptsuperscript𝛾𝑓superscript𝑓𝑇\displaystyle\mathcal{O}\left(\sqrt{\beta_{T}T\gamma^{ff^{\prime}}_{T}}\right).caligraphic_O ( square-root start_ARG italic_β start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT italic_T italic_γ start_POSTSUPERSCRIPT italic_f italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG ) .

Appendix K Proof of Thm. 5.4

We have

f(x)f(xt)=𝑓superscript𝑥𝑓subscript𝑥superscript𝑡absent\displaystyle f(x^{\star})-f(x_{t^{\star}})=italic_f ( italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) - italic_f ( italic_x start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) = (f(x)f(xt))(f(xt)f(xt))𝑓superscript𝑥𝑓superscriptsubscript𝑥superscript𝑡𝑓subscript𝑥superscript𝑡𝑓superscriptsubscript𝑥superscript𝑡\displaystyle(f(x^{\star})-f(x_{t^{\star}}^{\prime}))-(f(x_{t^{\star}})-f(x_{t% ^{\star}}^{\prime}))( italic_f ( italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) - italic_f ( italic_x start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) - ( italic_f ( italic_x start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) - italic_f ( italic_x start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) )
\displaystyle\leq (f~t(xt)f~t(xt))(f(xt)f(xt))subscript~𝑓superscript𝑡subscript𝑥superscript𝑡subscript~𝑓superscript𝑡superscriptsubscript𝑥superscript𝑡𝑓subscript𝑥superscript𝑡𝑓superscriptsubscript𝑥superscript𝑡\displaystyle(\tilde{f}_{t^{\star}}(x_{t^{\star}})-\tilde{f}_{t^{\star}}(x_{t^% {\star}}^{\prime}))-(f(x_{t^{\star}})-f(x_{t^{\star}}^{\prime}))( over~ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) - over~ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) - ( italic_f ( italic_x start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) - italic_f ( italic_x start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) )
\displaystyle\leq 2(2B+λ1/2βt1/2)σtff((xt,xt)),22𝐵superscript𝜆12superscriptsubscript𝛽superscript𝑡12subscriptsuperscript𝜎𝑓superscript𝑓superscript𝑡subscript𝑥superscript𝑡superscriptsubscript𝑥superscript𝑡\displaystyle 2(2B+\lambda^{-\nicefrac{{1}}{{2}}}\beta_{t^{\star}}^{\nicefrac{% {1}}{{2}}})\sigma^{ff^{\prime}}_{t^{\star}}((x_{t^{\star}},x_{t^{\star}}^{% \prime})),2 ( 2 italic_B + italic_λ start_POSTSUPERSCRIPT - / start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT / start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ) italic_σ start_POSTSUPERSCRIPT italic_f italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) ,

where σtffsubscriptsuperscript𝜎𝑓superscript𝑓superscript𝑡\sigma^{ff^{\prime}}_{t^{\star}}italic_σ start_POSTSUPERSCRIPT italic_f italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT is as given in Eq. (17) with the kernel function as kff((x1,x1),(x2,x2))=k(x1,x2)+k(x1,x2)superscript𝑘𝑓superscript𝑓subscript𝑥1superscriptsubscript𝑥1subscript𝑥2superscriptsubscript𝑥2𝑘subscript𝑥1subscript𝑥2𝑘superscriptsubscript𝑥1superscriptsubscript𝑥2k^{ff^{\prime}}((x_{1},x_{1}^{\prime}),(x_{2},x_{2}^{\prime}))=k(x_{1},x_{2})+% k(x_{1}^{\prime},x_{2}^{\prime})italic_k start_POSTSUPERSCRIPT italic_f italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) , ( italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) = italic_k ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) + italic_k ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) and βt=β(ϵ,δ/2,t)subscript𝛽superscript𝑡𝛽italic-ϵ𝛿2superscript𝑡\beta_{t^{\star}}=\beta(\epsilon,\nicefrac{{\delta}}{{2}},t^{\star})italic_β start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = italic_β ( italic_ϵ , / start_ARG italic_δ end_ARG start_ARG 2 end_ARG , italic_t start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ). Furthermore, by the definition of tsuperscript𝑡t^{\star}italic_t start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT,

2(2B+λ1/2βt1/2)σtff((xt,xt))22𝐵superscript𝜆12superscriptsubscript𝛽superscript𝑡12subscriptsuperscript𝜎𝑓superscript𝑓superscript𝑡subscript𝑥superscript𝑡superscriptsubscript𝑥superscript𝑡absent\displaystyle 2(2B+\lambda^{-\nicefrac{{1}}{{2}}}\beta_{t^{\star}}^{\nicefrac{% {1}}{{2}}})\sigma^{ff^{\prime}}_{t^{\star}}((x_{t^{\star}},x_{t^{\star}}^{% \prime}))\leq2 ( 2 italic_B + italic_λ start_POSTSUPERSCRIPT - / start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT / start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ) italic_σ start_POSTSUPERSCRIPT italic_f italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) ≤ 1Tt=1T2(2B+λ1/2βt1/2)σtff((xt,xt))1𝑇superscriptsubscript𝑡1𝑇22𝐵superscript𝜆12superscriptsubscript𝛽𝑡12subscriptsuperscript𝜎𝑓superscript𝑓𝑡subscript𝑥𝑡superscriptsubscript𝑥𝑡\displaystyle\frac{1}{T}\sum_{t=1}^{T}2(2B+\lambda^{-\nicefrac{{1}}{{2}}}\beta% _{t}^{\nicefrac{{1}}{{2}}})\sigma^{ff^{\prime}}_{t}((x_{t},x_{t}^{\prime}))divide start_ARG 1 end_ARG start_ARG italic_T end_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT 2 ( 2 italic_B + italic_λ start_POSTSUPERSCRIPT - / start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT / start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ) italic_σ start_POSTSUPERSCRIPT italic_f italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) )
\displaystyle\leq 2T(2B+λ1/2βT1/2)t=1Tσtff((xt,xt))2𝑇2𝐵superscript𝜆12superscriptsubscript𝛽𝑇12superscriptsubscript𝑡1𝑇subscriptsuperscript𝜎𝑓superscript𝑓𝑡subscript𝑥𝑡superscriptsubscript𝑥𝑡\displaystyle\frac{2}{T}(2B+\lambda^{-\nicefrac{{1}}{{2}}}\beta_{T}^{\nicefrac% {{1}}{{2}}})\sum_{t=1}^{T}\sigma^{ff^{\prime}}_{t}((x_{t},x_{t}^{\prime}))divide start_ARG 2 end_ARG start_ARG italic_T end_ARG ( 2 italic_B + italic_λ start_POSTSUPERSCRIPT - / start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT / start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ) ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT italic_f italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) )
\displaystyle\leq 2T(2B+λ1/2βT1/2)4(T+2)γTff2𝑇2𝐵superscript𝜆12superscriptsubscript𝛽𝑇124𝑇2subscriptsuperscript𝛾𝑓superscript𝑓𝑇\displaystyle\frac{2}{T}(2B+\lambda^{-\nicefrac{{1}}{{2}}}\beta_{T}^{\nicefrac% {{1}}{{2}}})\sqrt{4(T+2)\gamma^{ff^{\prime}}_{T}}divide start_ARG 2 end_ARG start_ARG italic_T end_ARG ( 2 italic_B + italic_λ start_POSTSUPERSCRIPT - / start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT / start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ) square-root start_ARG 4 ( italic_T + 2 ) italic_γ start_POSTSUPERSCRIPT italic_f italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG
=\displaystyle== 𝒪(βTγTffT).𝒪subscript𝛽𝑇subscriptsuperscript𝛾𝑓superscript𝑓𝑇𝑇\displaystyle\mathcal{O}\left(\frac{\sqrt{\beta_{T}\gamma^{ff^{\prime}}_{T}}}{% \sqrt{T}}\right).caligraphic_O ( divide start_ARG square-root start_ARG italic_β start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT italic_γ start_POSTSUPERSCRIPT italic_f italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG end_ARG start_ARG square-root start_ARG italic_T end_ARG end_ARG ) .

The conclusion then follows.

Appendix L Commonly used specific kernel functions

  • Linear:

    k(x,x¯)=xx¯.𝑘𝑥¯𝑥superscript𝑥top¯𝑥k(x,\bar{x})=x^{\top}\bar{x}.italic_k ( italic_x , over¯ start_ARG italic_x end_ARG ) = italic_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over¯ start_ARG italic_x end_ARG .
  • Squared Exponential (SE):

    k(x,x¯)=σSE2exp{xx¯2l2},𝑘𝑥¯𝑥superscriptsubscript𝜎SE2superscriptdelimited-∥∥𝑥¯𝑥2superscript𝑙2k(x,\bar{x})=\sigma_{\mathrm{SE}}^{2}\exp{\left\{-\frac{\left\lVert x-\bar{x}% \right\rVert^{2}}{l^{2}}\right\}},italic_k ( italic_x , over¯ start_ARG italic_x end_ARG ) = italic_σ start_POSTSUBSCRIPT roman_SE end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_exp { - divide start_ARG ∥ italic_x - over¯ start_ARG italic_x end_ARG ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_l start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG } ,

    where σSE2superscriptsubscript𝜎SE2\sigma_{\mathrm{SE}}^{2}italic_σ start_POSTSUBSCRIPT roman_SE end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is the variance parameter and l𝑙litalic_l is the lengthscale parameter.

  • Matérn:

    k(x,x¯)=21νΓ(ν)(2νxx¯ρ)νKν(2νxx¯ρ),𝑘𝑥¯𝑥superscript21𝜈Γ𝜈superscript2𝜈delimited-∥∥𝑥¯𝑥𝜌𝜈subscript𝐾𝜈2𝜈delimited-∥∥𝑥¯𝑥𝜌k(x,\bar{x})=\frac{2^{1-\nu}}{\Gamma(\nu)}\left(\sqrt{2\nu}\frac{\left\lVert x% -\bar{x}\right\rVert}{\rho}\right)^{\nu}K_{\nu}\left(\sqrt{2\nu}\frac{\left% \lVert x-\bar{x}\right\rVert}{\rho}\right),italic_k ( italic_x , over¯ start_ARG italic_x end_ARG ) = divide start_ARG 2 start_POSTSUPERSCRIPT 1 - italic_ν end_POSTSUPERSCRIPT end_ARG start_ARG roman_Γ ( italic_ν ) end_ARG ( square-root start_ARG 2 italic_ν end_ARG divide start_ARG ∥ italic_x - over¯ start_ARG italic_x end_ARG ∥ end_ARG start_ARG italic_ρ end_ARG ) start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT italic_K start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT ( square-root start_ARG 2 italic_ν end_ARG divide start_ARG ∥ italic_x - over¯ start_ARG italic_x end_ARG ∥ end_ARG start_ARG italic_ρ end_ARG ) ,

    where ρ𝜌\rhoitalic_ρ and ν𝜈\nuitalic_ν are the two positive parameters of the kernel function, ΓΓ\Gammaroman_Γ is the gamma function, and Kνsubscript𝐾𝜈K_{\nu}italic_K start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT is the modified Bessel function of the second kind. ν𝜈\nuitalic_ν captures the smoothness of the kernel function.

Appendix M Proof of Thm. 5.5

Recall that

β(ϵ,δ/2,t)=σ¯2Hσ(β2(ϵ,δ,t)+2β1(ϵ,δ,t))=𝒪(tlogt𝒩(f,ϵ,)δ+ϵt+ϵ2t).\beta(\epsilon,\nicefrac{{\delta}}{{2}},t)=\frac{\underline{\sigma^{\prime}}^{% 2}}{H_{\sigma}}\left(\beta_{2}(\epsilon,\delta,t)+2\beta_{1}(\epsilon,\delta,t% )\right)=\mathcal{O}\left(\sqrt{t\log\frac{t\mathcal{N}(\mathcal{B}_{f},% \epsilon,\|\cdot\|_{\infty})}{\delta}}+\epsilon t+\epsilon^{2}t\right).italic_β ( italic_ϵ , / start_ARG italic_δ end_ARG start_ARG 2 end_ARG , italic_t ) = divide start_ARG under¯ start_ARG italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_H start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT end_ARG ( italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_ϵ , italic_δ , italic_t ) + 2 italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_ϵ , italic_δ , italic_t ) ) = caligraphic_O ( square-root start_ARG italic_t roman_log divide start_ARG italic_t caligraphic_N ( caligraphic_B start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT , italic_ϵ , ∥ ⋅ ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ) end_ARG start_ARG italic_δ end_ARG end_ARG + italic_ϵ italic_t + italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_t ) .

We pick ϵ=1/Titalic-ϵ1𝑇\epsilon=\nicefrac{{1}}{{T}}italic_ϵ = / start_ARG 1 end_ARG start_ARG italic_T end_ARG, and can thus derive,

βT=β(T1,δ/2,T)=𝒪(TlogT𝒩(f,T1,)δ).\beta_{T}=\beta(T^{-1},\nicefrac{{\delta}}{{2}},T)=\mathcal{O}\left(\sqrt{T% \log\frac{T\mathcal{N}(\mathcal{B}_{f},T^{-1},\|\cdot\|_{\infty})}{\delta}}% \right).italic_β start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = italic_β ( italic_T start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT , / start_ARG italic_δ end_ARG start_ARG 2 end_ARG , italic_T ) = caligraphic_O ( square-root start_ARG italic_T roman_log divide start_ARG italic_T caligraphic_N ( caligraphic_B start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT , italic_T start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT , ∥ ⋅ ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ) end_ARG start_ARG italic_δ end_ARG end_ARG ) .
  1. 1.

    k𝑘kitalic_k is a linear kernel, then the corresponding RKHS is a finite-dimensional space and log𝒩(f,T1,)=𝒪(log1ϵ)=𝒪(logT)\log\mathcal{N}(\mathcal{B}_{f},T^{-1},\|\cdot\|_{\infty})=\mathcal{O}\left(% \log\frac{1}{\epsilon}\right)=\mathcal{O}\left(\log T\right)roman_log caligraphic_N ( caligraphic_B start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT , italic_T start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT , ∥ ⋅ ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ) = caligraphic_O ( roman_log divide start_ARG 1 end_ARG start_ARG italic_ϵ end_ARG ) = caligraphic_O ( roman_log italic_T ) (see, e.g., (Wu, 2017)). The corresponding kff((x,x),(y,y))=xy+xy=(x,x),(y,y)superscript𝑘𝑓superscript𝑓𝑥superscript𝑥𝑦superscript𝑦superscript𝑥top𝑦superscriptsuperscript𝑥topsuperscript𝑦𝑥superscript𝑥𝑦superscript𝑦k^{ff^{\prime}}((x,x^{\prime}),(y,y^{\prime}))=x^{\top}y+{x^{\prime}}^{\top}y^% {\prime}=\langle(x,x^{\prime}),(y,y^{\prime})\rangleitalic_k start_POSTSUPERSCRIPT italic_f italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( ( italic_x , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) , ( italic_y , italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) = italic_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_y + italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = ⟨ ( italic_x , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) , ( italic_y , italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ⟩, which is also linear. Thus, by Thm. 5 in (Srinivas et al., 2012),

    γTff=𝒪(logT).superscriptsubscript𝛾𝑇𝑓superscript𝑓𝒪𝑇\gamma_{T}^{ff^{\prime}}=\mathcal{O}(\log T).italic_γ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_f italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT = caligraphic_O ( roman_log italic_T ) .

    Hence,

    RT=𝒪((TlogT)1/4+1/2)=𝒪(T3/4(logT)3/4).subscript𝑅𝑇𝒪superscript𝑇𝑇1412𝒪superscript𝑇34superscript𝑇34R_{T}=\mathcal{O}\left((T\log T)^{1/4+1/2}\right)=\mathcal{O}\left(T^{% \nicefrac{{3}}{{4}}}(\log T)^{\nicefrac{{3}}{{4}}}\right).italic_R start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = caligraphic_O ( ( italic_T roman_log italic_T ) start_POSTSUPERSCRIPT 1 / 4 + 1 / 2 end_POSTSUPERSCRIPT ) = caligraphic_O ( italic_T start_POSTSUPERSCRIPT / start_ARG 3 end_ARG start_ARG 4 end_ARG end_POSTSUPERSCRIPT ( roman_log italic_T ) start_POSTSUPERSCRIPT / start_ARG 3 end_ARG start_ARG 4 end_ARG end_POSTSUPERSCRIPT ) .
  2. 2.

    k𝑘kitalic_k is a squared exponential kernel, then log𝒩(f,T1,)=𝒪((log1ϵ)d+1)=𝒪((logT)d+1)\log\mathcal{N}(\mathcal{B}_{f},T^{-1},\|\cdot\|_{\infty})=\mathcal{O}\left((% \log\frac{1}{\epsilon})^{d+1}\right)=\mathcal{O}\left((\log T)^{d+1}\right)roman_log caligraphic_N ( caligraphic_B start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT , italic_T start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT , ∥ ⋅ ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ) = caligraphic_O ( ( roman_log divide start_ARG 1 end_ARG start_ARG italic_ϵ end_ARG ) start_POSTSUPERSCRIPT italic_d + 1 end_POSTSUPERSCRIPT ) = caligraphic_O ( ( roman_log italic_T ) start_POSTSUPERSCRIPT italic_d + 1 end_POSTSUPERSCRIPT ) (Example 4, (Zhou, 2002)). By Thm. 4 in (Kandasamy et al., 2015), we have,

    γTff=𝒪((logT)d+1).superscriptsubscript𝛾𝑇𝑓superscript𝑓𝒪superscript𝑇𝑑1\gamma_{T}^{ff^{\prime}}=\mathcal{O}((\log T)^{d+1}).italic_γ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_f italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT = caligraphic_O ( ( roman_log italic_T ) start_POSTSUPERSCRIPT italic_d + 1 end_POSTSUPERSCRIPT ) .

    Hence,

    RT=𝒪(T3/4(logT)3/4(d+1)).subscript𝑅𝑇𝒪superscript𝑇34superscript𝑇34𝑑1R_{T}=\mathcal{O}\left(T^{\nicefrac{{3}}{{4}}}(\log T)^{\nicefrac{{3}}{{4}}(d+% 1)}\right).italic_R start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = caligraphic_O ( italic_T start_POSTSUPERSCRIPT / start_ARG 3 end_ARG start_ARG 4 end_ARG end_POSTSUPERSCRIPT ( roman_log italic_T ) start_POSTSUPERSCRIPT / start_ARG 3 end_ARG start_ARG 4 end_ARG ( italic_d + 1 ) end_POSTSUPERSCRIPT ) .
  3. 3.

    k𝑘kitalic_k is a Matérn kernel. Lem. 3 in (Bull, 2011) implies the equivalence between RKHS and Sobolev Hilbert space. We can then apply the rich results on the bound of covering number of Sobolev Hilbert space (Edmunds & Triebel, 1996). So log𝒩(f,T1,)=𝒪((1ϵ)d/νlog1ϵ)=𝒪(Td/νlogT)\log\mathcal{N}(\mathcal{B}_{f},T^{-1},\|\cdot\|_{\infty})=\mathcal{O}\left((% \frac{1}{\epsilon})^{\nicefrac{{d}}{{\nu}}}\log\frac{1}{\epsilon}\right)=% \mathcal{O}\left(T^{\nicefrac{{d}}{{\nu}}}\log T\right)roman_log caligraphic_N ( caligraphic_B start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT , italic_T start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT , ∥ ⋅ ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ) = caligraphic_O ( ( divide start_ARG 1 end_ARG start_ARG italic_ϵ end_ARG ) start_POSTSUPERSCRIPT / start_ARG italic_d end_ARG start_ARG italic_ν end_ARG end_POSTSUPERSCRIPT roman_log divide start_ARG 1 end_ARG start_ARG italic_ϵ end_ARG ) = caligraphic_O ( italic_T start_POSTSUPERSCRIPT / start_ARG italic_d end_ARG start_ARG italic_ν end_ARG end_POSTSUPERSCRIPT roman_log italic_T ) (by combing the lower bound in Thm. 5.1 (Xu et al., 2022a) and the convergence rate in Thm. 1 (Bull, 2011)). By Thm. 4 in (Kandasamy et al., 2015), we have,

    γTff=𝒪(Td(d+1)2ν+d(d+1)logT).superscriptsubscript𝛾𝑇𝑓superscript𝑓𝒪superscript𝑇𝑑𝑑12𝜈𝑑𝑑1𝑇\gamma_{T}^{ff^{\prime}}=\mathcal{O}\left(T^{\frac{d(d+1)}{2\nu+d(d+1)}}\log T% \right).italic_γ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_f italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT = caligraphic_O ( italic_T start_POSTSUPERSCRIPT divide start_ARG italic_d ( italic_d + 1 ) end_ARG start_ARG 2 italic_ν + italic_d ( italic_d + 1 ) end_ARG end_POSTSUPERSCRIPT roman_log italic_T ) .

    Hence,

    RT=𝒪(T3/4(logT)3/4Tdν(14+d+14+2(d+1)d/ν))𝒪(T3/4(logT)3/4T14d(d+2)ν).subscript𝑅𝑇𝒪superscript𝑇34superscript𝑇34superscript𝑇𝑑𝜈14𝑑142𝑑1𝑑𝜈𝒪superscript𝑇34superscript𝑇34superscript𝑇14𝑑𝑑2𝜈R_{T}=\mathcal{O}\left(T^{\nicefrac{{3}}{{4}}}(\log T)^{\nicefrac{{3}}{{4}}}T^% {\frac{d}{\nu}\left(\frac{1}{4}+\frac{d+1}{4+2(d+1)\nicefrac{{d}}{{\nu}}}% \right)}\right)\leq\mathcal{O}\left(T^{\nicefrac{{3}}{{4}}}(\log T)^{\nicefrac% {{3}}{{4}}}T^{\frac{1}{4}\frac{d(d+2)}{\nu}}\right).italic_R start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = caligraphic_O ( italic_T start_POSTSUPERSCRIPT / start_ARG 3 end_ARG start_ARG 4 end_ARG end_POSTSUPERSCRIPT ( roman_log italic_T ) start_POSTSUPERSCRIPT / start_ARG 3 end_ARG start_ARG 4 end_ARG end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT divide start_ARG italic_d end_ARG start_ARG italic_ν end_ARG ( divide start_ARG 1 end_ARG start_ARG 4 end_ARG + divide start_ARG italic_d + 1 end_ARG start_ARG 4 + 2 ( italic_d + 1 ) / start_ARG italic_d end_ARG start_ARG italic_ν end_ARG end_ARG ) end_POSTSUPERSCRIPT ) ≤ caligraphic_O ( italic_T start_POSTSUPERSCRIPT / start_ARG 3 end_ARG start_ARG 4 end_ARG end_POSTSUPERSCRIPT ( roman_log italic_T ) start_POSTSUPERSCRIPT / start_ARG 3 end_ARG start_ARG 4 end_ARG end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 4 end_ARG divide start_ARG italic_d ( italic_d + 2 ) end_ARG start_ARG italic_ν end_ARG end_POSTSUPERSCRIPT ) .

Appendix N Empirical Evidence for the Order of The Cumulative Regret

Fig. 4 shows the cumulative regret of POP-BO algorithm. The experimental conditions are the same as in Sec. 6.1. Note that both horizontal and vertical axes in Fig. 4 are in log scale, and thus the slope of the curve roughly represents the power of the cumulative regret. It can be clearly seen that the order of the cumulative regret is between T𝑇\sqrt{T}square-root start_ARG italic_T end_ARG and T𝑇Titalic_T (indeed, close to T34superscript𝑇34T^{\frac{3}{4}}italic_T start_POSTSUPERSCRIPT divide start_ARG 3 end_ARG start_ARG 4 end_ARG end_POSTSUPERSCRIPT by checking the slope in log scale), which verifies our theoretical results in Thm. 5.5.

100superscript100\displaystyle{10^{0}}10 start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT101superscript101\displaystyle{10^{1}}10 start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT100superscript100\displaystyle{10^{0}}10 start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT101superscript101\displaystyle{10^{1}}10 start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPTStepCumulative RegretT𝑇Titalic_TT𝑇\sqrt{T}square-root start_ARG italic_T end_ARGRTsubscript𝑅𝑇R_{T}italic_R start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT of POP-BO
Figure 4: Cumulative regret of our algorithm in log scale. For reference purpose, we also plot T𝑇\sqrt{T}square-root start_ARG italic_T end_ARG and T𝑇Titalic_T in log scale.

Appendix O Kernel-Specific Convergence Rate

Similar to the bounds in the Appendix M, we can plug in the kernel-specific covering number and maximum information gain to derive the kernel-specific convergence rate in Tab. 3.

Table 3: Kernel-specific convergence rate for xtsubscript𝑥superscript𝑡x_{t^{\star}}italic_x start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT.
Kernel Linear Squared Exponential Matérn (ν>d4(3+d+d2+14d+17)=Θ(d2))𝜈𝑑43𝑑superscript𝑑214𝑑17Θsuperscript𝑑2\left(\nu>\frac{d}{4}(3+d+\sqrt{d^{2}+14d+17})=\Theta(d^{2})\right)( italic_ν > divide start_ARG italic_d end_ARG start_ARG 4 end_ARG ( 3 + italic_d + square-root start_ARG italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 14 italic_d + 17 end_ARG ) = roman_Θ ( italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) )
f(x)f(xt)𝑓superscript𝑥𝑓subscript𝑥superscript𝑡f(x^{\star})-f(x_{t^{\star}})italic_f ( italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) - italic_f ( italic_x start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) 𝒪((logT)3/4T1/4)𝒪superscript𝑇34superscript𝑇14{\mathcal{O}}\left(\frac{(\log T)^{\nicefrac{{3}}{{4}}}}{T^{\nicefrac{{1}}{{4}% }}}\right)caligraphic_O ( divide start_ARG ( roman_log italic_T ) start_POSTSUPERSCRIPT / start_ARG 3 end_ARG start_ARG 4 end_ARG end_POSTSUPERSCRIPT end_ARG start_ARG italic_T start_POSTSUPERSCRIPT / start_ARG 1 end_ARG start_ARG 4 end_ARG end_POSTSUPERSCRIPT end_ARG ) 𝒪((logT)3/4(d+1)T1/4)𝒪superscript𝑇34𝑑1superscript𝑇14{\mathcal{O}}\left(\frac{(\log T)^{\nicefrac{{3}}{{4}}(d+1)}}{T^{\nicefrac{{1}% }{{4}}}}\right)caligraphic_O ( divide start_ARG ( roman_log italic_T ) start_POSTSUPERSCRIPT / start_ARG 3 end_ARG start_ARG 4 end_ARG ( italic_d + 1 ) end_POSTSUPERSCRIPT end_ARG start_ARG italic_T start_POSTSUPERSCRIPT / start_ARG 1 end_ARG start_ARG 4 end_ARG end_POSTSUPERSCRIPT end_ARG ) 𝒪((logT)3/4Tdν(14+d+14+2(d+1)d/ν)T1/4)𝒪superscript𝑇34superscript𝑇𝑑𝜈14𝑑142𝑑1𝑑𝜈superscript𝑇14{\mathcal{O}}\left(\frac{(\log T)^{\nicefrac{{3}}{{4}}}T^{\frac{d}{\nu}\left(% \frac{1}{4}+\frac{d+1}{4+2(d+1)\nicefrac{{d}}{{\nu}}}\right)}}{T^{\nicefrac{{1% }}{{4}}}}\right)caligraphic_O ( divide start_ARG ( roman_log italic_T ) start_POSTSUPERSCRIPT / start_ARG 3 end_ARG start_ARG 4 end_ARG end_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT divide start_ARG italic_d end_ARG start_ARG italic_ν end_ARG ( divide start_ARG 1 end_ARG start_ARG 4 end_ARG + divide start_ARG italic_d + 1 end_ARG start_ARG 4 + 2 ( italic_d + 1 ) / start_ARG italic_d end_ARG start_ARG italic_ν end_ARG end_ARG ) end_POSTSUPERSCRIPT end_ARG start_ARG italic_T start_POSTSUPERSCRIPT / start_ARG 1 end_ARG start_ARG 4 end_ARG end_POSTSUPERSCRIPT end_ARG )

Appendix P More Experimental Results and Details

Selection of Hyperparameters. Three key hyperparameters that influence the performance of POP-BO are the kernel lengthscale, the norm bound and the confidence level term β𝛽\betaitalic_β as shown in Thm. 3.1. We set β=β0t𝛽subscript𝛽0𝑡\beta=\beta_{0}\sqrt{t}italic_β = italic_β start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT square-root start_ARG italic_t end_ARG, where β0subscript𝛽0\beta_{0}italic_β start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is set to 1.01.01.01.0 by default. For the sampled instances from Gaussian processes, the lengthscale is set to be the ground truth and the norm bound is set to be 1.11.11.11.1 times the ground truth. For the test function examples, we choose the lengthscale by maximizing the likelihood value over a set of randomly sampled data and set the norm bound to be 6666 by default (with the test functions all normalized).

Details on Sampled Instances from Gaussian Process. Specifically, we randomly sample some knot points from a joint Gaussian distribution marginalized from the Gaussian process, and then construct its corresponding minimum-norm interpolant (Maddalena et al., 2021) as the ground truth function.

Empirical Method for Reporting a Solution. In the experiment of test function optimization, we report the point that maximizes the minimum norm maximum likelihood estimator f^tMLEsubscriptsuperscript^𝑓MLE𝑡\hat{f}^{\textrm{MLE}}_{t}over^ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT MLE end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, which achieves better empirical performance.

Solution Report Method for Baselines. The approach to reporting a solution is the same as in the original paper of the baseline algorithm if it is mentioned. Therefore, for the baseline qEUBO (Astudillo et al., 2023), we report the solution that maximizes the expected objective value conditioned on the historical samples. For the baseline SGP (Takeno et al., 2023), we report the first point of the duel proposed by the algorithm in step t𝑡titalic_t. For the baseline DTS (González et al., 2017), we report the Condorcet winner.

Effect of Hyperparameters. We conducted more experiments to assess the effect of hyperparameters. We observe that the hyperparameters with most influence are the norm bound B𝐵Bitalic_B and the confidence level βtsubscript𝛽𝑡\beta_{t}italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. The larger the norm bound B𝐵Bitalic_B is, the more variance the estimate function has. If B𝐵Bitalic_B is set too large, the convergence for the suboptimality of the reported solution tends to be slower. βtsubscript𝛽𝑡\beta_{t}italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT can be set to be β0tsubscript𝛽0𝑡\beta_{0}\sqrt{t}italic_β start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT square-root start_ARG italic_t end_ARG in practice and determines the level of exploration, where β0subscript𝛽0\beta_{0}italic_β start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is a fixed constant. The larger β0subscript𝛽0\beta_{0}italic_β start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is, the more explorative the algorithm is and may have higher cumulative regret. But setting β0subscript𝛽0\beta_{0}italic_β start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT to be very small may also cause weak exploration and make the suboptimality of the reported solution converge slower.

P.1 Experimental Results for Higher-Dimensional Problems

P.1.1 Higher-Dimensional Problems Sampled from Gaussian Process

We consider the optimization of 7777-dimensional black-box function sampled from a Gaussian process with kernel function as shown in Eq. (106),

k(x,x¯)=σSE2exp{xx¯2l2}𝑘𝑥¯𝑥superscriptsubscript𝜎SE2superscriptdelimited-∥∥𝑥¯𝑥2superscript𝑙2k(x,\bar{x})=\sigma_{\mathrm{SE}}^{2}\exp{\left\{-\frac{\left\lVert x-\bar{x}% \right\rVert^{2}}{l^{2}}\right\}}italic_k ( italic_x , over¯ start_ARG italic_x end_ARG ) = italic_σ start_POSTSUBSCRIPT roman_SE end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_exp { - divide start_ARG ∥ italic_x - over¯ start_ARG italic_x end_ARG ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_l start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG } (106)

where σSE2=9.0superscriptsubscript𝜎SE29.0\sigma_{\mathrm{SE}}^{2}=9.0italic_σ start_POSTSUBSCRIPT roman_SE end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 9.0 and l=57𝑙57l=5\sqrt{7}italic_l = 5 square-root start_ARG 7 end_ARG. The optimization domain is set to be [0,10]7superscript0107[0,10]^{7}[ 0 , 10 ] start_POSTSUPERSCRIPT 7 end_POSTSUPERSCRIPT. We run 20202020 randomly sampled instances for 100100100100 steps. The average update time for each step t𝑡titalic_t is only 11.011.011.011.0 seconds on a personal computer with one Intel64 Family 6 Model 142 Stepping 12 GenuineIntel  1803 Mhz processor and 16.0 GB RAM. This is comparably very small considering that each query to the comparison oracle can be very expensive in practice (e.g., heating the room up to a certain temperature to evaluate occupant comfort, which may take tens of minutes). We compare our method to the SGP baseline.

Refer to caption
Figure 5: Cumulative regret in log scale and the suboptimality of reported solution in linear scale for a 7777-dimensional problem sampled from Gaussian process. For reference purpose, we also plot T𝑇Titalic_T in the cumulative regret plot in log scale, where the shaded areas represent ±0.2 standard deviationplus-or-minus0.2 standard deviation\pm 0.2\textsf{ standard deviation}± 0.2 standard deviation.

Fig. 5 shows the cumulative regret (in log scale) and the suboptimality of the reported solution for our POP-BO algorithm, where the reported solution is derived by maximizing the maximum likelihood estimate function. It can be clearly seen that our algorithm achieves both sublinear regret growth and fast convergence for the suboptimality of the reported solution in this 7-dimensional problem. Interestingly, the suboptimality of SGP converges similarly to our method before 50 steps, but get even worse after 50 steps. This is because SGP ignores the randomness in the preference feedback, which leads to misbelief in the function difference value, and such misbelief is more significant when the function difference value is small.

P.1.2 Higher-Dimensional Test Problem

In this section, we further consider the optimization of the 6666-dimensional Ackley function as shown in (Astudillo et al., 2023). For this problem, we compare POP-BO algorithm to the qEUBO algorithm proposed in (Astudillo et al., 2023). Fig. 6 shows the cumulative regret and the suboptimality of the reported solution. In this particular problem, qEUBO performs better than our POP-BO algorithm in terms of cumulative regret, while our POP-BO algorithm performs slightly better than qEUBO in terms of the suboptimality of the reported solution.

Refer to caption
Figure 6: Cumulative regret and the suboptimality of reported solution for the 6666-dimensional Ackley function optimization problem, where the shaded areas represent ±0.5 standard deviationplus-or-minus0.5 standard deviation\pm 0.5\textsf{ standard deviation}± 0.5 standard deviation.

P.2 Occupant Thermal Comfort Optimization

P.2.1 Two-Dimensional Comfort Optimization

An accurate model of human thermal comfort is crucial for improving occupants’ comfort while saving energy in buildings. However, establishing such a model has proven to be a complex and challenging task (Zhang et al., 2024) and standard offline models ignore the individual differences among occupants. In this section, we consider the real-world problem of maximizing occupant thermal comfort directly from thermal preference feedback. To emulate real human thermal sensation, we use the well-known and widely adopted Predicted Mean Vote (PMV) model (Fanger et al., 1970) as the ground truth and generate the preference feedback according to the Bernoulli model as assumed in Assumption 2.5. We optimize the indoor air temperature and air speed, which are the two major factors that influence thermal comfort and are controllable by HVAC (Heating, Ventilation, and Air Conditioning) systems and fans. Indeed, tuning these two factors has been proven effective in providing thermal comfort while minimizing energy consumption (Lyu et al., 2023). The result is shown in Fig. 7 where the mean is taken over 30 instances of simulation. It can be seen that our method stably achieves superior performance in optimizing human thermal comfort, which implies its potential to deal with preferential feedback in real-world applications. It is also noticeable that although qEUBO achieves slightly better performance in terms of the convergence of the reported solution, the cumulative regret of qEUBO is almost twice of POP-BO’s cumulative regret. This means our method is more favorable in applications where online performance during the optimization is also critical, such as online tuning of HVAC systems.

Refer to caption
Figure 7: Cumulative regret and the suboptimality of reported solution of different algorithms for thermal comfort optimization.

P.2.2 Scalability to higher dimension

Additionally, to demonstrate the scalability of POP-BO in this real-world comfort optimization problem, we additionally tune the mean radiant temperature and relative humidity, which results in a four-dimensional black-box optimization problem. The result is shown in Fig. 8. It can be observed that increasing the dimensionality does not drastically decrease the convergence rate of our method. Furthermore, the baseline method qEUBO can decrease the objective value very fast in the initial steps, but seems to be still very oscillatory after 10 steps. In contrast, our method converges faster than SGP without the oscillation issue like qEUBO.

Refer to caption
Figure 8: Cumulative regret and the suboptimality of reported solution of different algorithms for the four-dimensional thermal comfort optimization problem.

P.3 Details About the Results in Tab. 2

The cumulative regret and evolution of suboptimality for the different test problems in Tab. 2 are shown in Fig. 9. Since the considered problems only have 2222-dimensional input and in the applications of Bayesian optimization, it is typically desired to obtain a set of solution with objective value as close to the optimal value as possible. So we only consider 30303030 steps here. Other baselines can make limited progress in terms of the suboptimality of the reported solution within only 30303030 steps (partially also due to the ‘adversarial’ property of the test functions, i.e., severe non-convexity and multiple local maxima) as shown in Tab. 2. To the sharp contrast, our POP-BO algorithm makes significant progress in reducing the suboptimality of the reported solution by balancing exploration and exploitation, and estimating the best solution in a principled way.

Refer to caption
Figure 9: Cumulative regret and the suboptimality of reported solution of POP-BO algorithm for the different test problems in Tab. 2.

To provide more insights into POP-BO’s performance across different settings, we compare our algorithm’s evolution of cumulative regret and suboptimality to other baseline methods for each test problem in Fig. 10 and Fig. 11. It can be observed that our method may perform slightly worse than some baselines in certain problems. For example, our method performs slightly worse than qEUBO in the Bukin problem in terms of suboptimality. However, our method performs stably and is consistently one of the best in all the test problems in terms of the suboptimality.

Refer to caption
Figure 10: Cumulative regret and the suboptimality of reported solution of different algorithms for the test problems Beale, Branin, Bukin, and Cross-in-Tray in Tab. 2.
Refer to caption
Figure 11: Cumulative regret and the suboptimality of reported solution of different algorithms for the test problems Eggholder, Holder Table, and Levy13 in Tab. 2.

Appendix Q Additional Contributions as Compared to (Mehta et al., 2023)

Notably, (Mehta et al., 2023) proposes Borda-AE algorithm, which directly learns the winning probability function using kernel ridge regression. This key design allows the authors to derive an information-theoretic convergence rate and efficient computation method without diving into the learning of the underlying reward function.

However, (Mehta et al., 2023) has key limitations and our paper makes additional contributions in the following two aspects.

  1. 1.

    Cumulative regret bound. There are two possible ways to define cumulative regret. One way is that we can define the (partial) cumulative regret as the summation of the suboptimality of only xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT (that is, t=1T(f(x)f(xt))superscriptsubscript𝑡1𝑇𝑓superscript𝑥𝑓subscript𝑥𝑡\sum_{t=1}^{T}(f(x^{\star})-f(x_{t}))∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_f ( italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) - italic_f ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) )). With this (partial) cumulative regret definition, Borda-AE algorithm can provide a sublinear (partial) cumulative regret bound, although it has linear growth in the cumulative regret of the compared point sequence {xt}t=1Tsuperscriptsubscriptsuperscriptsubscript𝑥𝑡𝑡1𝑇\{x_{t}^{\prime}\}_{t=1}^{T}{ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT. However, in many practical online learning applications, it is desired to control the suboptimality of both xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and xtsuperscriptsubscript𝑥𝑡x_{t}^{\prime}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT sequences. For example, when tuning the thermal/visual comfort of room occupants, we require the occupants to experience both xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and xtsuperscriptsubscript𝑥𝑡x_{t}^{\prime}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT conditions for comparison purposes and the suboptimality (links to discomfort) caused by both xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and xtsuperscriptsubscript𝑥𝑡x_{t}^{\prime}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT need to be managed.

    Therefore, it is more practically relevant to define (total) cumulative regret as the total cumulative suboptimality of both xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and xtsuperscriptsubscript𝑥𝑡x_{t}^{\prime}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT sequences (that is, t=1T(f(x)f(xt))+t=1T(f(x)f(xt))superscriptsubscript𝑡1𝑇𝑓superscript𝑥𝑓subscript𝑥𝑡superscriptsubscript𝑡1𝑇𝑓superscript𝑥𝑓superscriptsubscript𝑥𝑡\sum_{t=1}^{T}(f(x^{\star})-f(x_{t}))+\sum_{t=1}^{T}(f(x^{\star})-f(x_{t}^{% \prime}))∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_f ( italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) - italic_f ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) + ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_f ( italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) - italic_f ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) )). Interestingly, since xt=xt1superscriptsubscript𝑥𝑡subscript𝑥𝑡1x_{t}^{\prime}=x_{t-1}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_x start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT by the design of our POP-BO algorithm, this (total) cumulative regret bound reduces to 2t=1T(f(x)f(xt))2superscriptsubscript𝑡1𝑇𝑓superscript𝑥𝑓subscript𝑥𝑡2\sum_{t=1}^{T}(f(x^{\star})-f(x_{t}))2 ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_f ( italic_x start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) - italic_f ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ), for which we provide our sublinear cumulative regret bound. As such, the (total) cumulative regret bound provided by our paper is stronger than the (partial) cumulative regret bound that could be obtained by (Mehta et al., 2023).

  2. 2.

    Applicability to online learning problem. Following the last point, (Mehta et al., 2023) is not applicable to the online learning problem since in line 6 of the Borda-AE algorithm, atsuperscriptsubscript𝑎𝑡a_{t}^{\prime}italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is uniformly sampled from the action space, which leads to a linear growth of cumulative regret. This means Borda-AE has very poor online performance and can not be applied to an online learning problem. For example, in building thermal comfort tuning, we also want to control the discomfort caused during the tuning process. In contrast, our POP-BO algorithm has good online performance with both a theoretical bound on cumulative regret (Thm. 5.2) and empirical evidence on small cumulative regret (Fig. 2).