0% found this document useful (0 votes)
2 views

Error Bounds Kernel-Based Approximation

This document discusses the derivation of computable error bounds for kernel-based approximation using Gaussian process regression techniques. It addresses two main problems: the conservativeness of worst-case error bounds and their computability, proposing a solution through maximum likelihood estimation and cross-validation. The article also highlights the asymptotic behavior of these error bounds and their application in practical scenarios.

Uploaded by

yanhaogit
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Error Bounds Kernel-Based Approximation

This document discusses the derivation of computable error bounds for kernel-based approximation using Gaussian process regression techniques. It addresses two main problems: the conservativeness of worst-case error bounds and their computability, proposing a solution through maximum likelihood estimation and cross-validation. The article also highlights the asymptotic behavior of these error bounds and their application in practical scenarios.

Uploaded by

yanhaogit
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Special Issue dedicated to Robert Schaback on the occasion of his 75th birthday, Volume 15 · 2022 · Pages 65–77

Error Bounds and the Asymptotic Setting in Kernel-Based


Approximation
Toni Karvonen 1

Communicated by Gabriele Santin

Abstract
We use ideas from Gaussian process regression to derive computable error bounds that can be used as
stopping criteria in kernel-based approximation. The proposed bounds are based on maximum likelihood
estimation and cross-validation of a kernel scale parameter and take the form of a product of the scale
parameter estimate and the worst-case approximation error in the reproducing kernel Hilbert space
induced by the kernel. We also use known results on the so-called asymptotic setting to argue that such
worst-case type error bounds are not necessarily conservative.

1 Introduction
Let K : Ω × Ω → R be a strictly positive-definite kernel on an infinite set Ω and H(K) its reproducing kernel Hilbert space (RKHS)
equipped with an inner product 〈·, ·〉K and the resulting norm ∥·∥K . The RKHS is a Hilbert space consisting of real-valued functions
defined on Ω such that
(i) kernel translates are elements of H(K), in that K(·, x) ∈ H(K) for every x ∈ Ω, and
(ii) the kernel has the reproducing property, which states that 〈 f , K(·, x)〉K = f (x) for every f ∈ H(K) and x ∈ Ω.
See [15] for a review of RKHSs. Let {x i }∞
i=1
be a set of distinct points in Ω and X n = {x 1 , . . . , x n } the set of first n of them. For
every n ∈ N and any function f : Ω → R there exists a unique minimum-norm interpolant I n f = I X n f ∈ H(K):

I n f = I X n f = arg min ∥s∥K : s|X n = f |X n = f T K −1 K (·),
n n,n n
(1)
s∈H(K)

where f n = ( f (x 1 ), . . . , f (x n )) ∈ Rn , K n,n ∈ Rn×n is the positive-definite kernel matrix with elements (K K n,n )i j = K(x i , x j ) and
K n (·) = (K(·, x 1 ), . . . , K(·, x n )) ∈ Rn . This interpolant is often called the kernel interpolant or, if the kernel K is radial, the radial
basis function interpolant. The kernel interpolant is the unique function in the span of K(·, x 1 ), . . . , K(·, x n ) that interpolates f
at X n . The power function
| f (x) − (I n f )(x)| q
Pn (x) = PX n (x) = sup | f (x) − (I n f )(x)| = sup = K(x, x) − K n (x)TK −1 K (x)
n,n n
(2)
∥ f ∥K ≤1 f ∈H(K), f ̸=0 ∥ f ∥K

quantifies the approximation quality of the kernel interpolant in H(K).


Let L be a bounded linear functional on H(K). One can show that the approximation (or quadrature rule) of L( f ) obtained
by applying L to the kernel interpolant I n f is worst-case optimal in H(K). That is, the kernel quadrature rule
n
X
Q L,n ( f ) = L(I n f ) = f T Kn) =
K −1 L(K
n n,n
w n,i f (x i ) (3)
i=1

where w n = (w n,1 , . . . , w n,n ) = K −1


n,n
K n ) ∈ Rn are the quadrature weights, is the unique worst-case optimal linear approximation
L(K
in H(K) given standard information at X n :
n
X
w n = arg min sup L( f ) − vi f (x i ) .
v1 ,...,vn ∈R ∥ f ∥K ≤1 i=1

It also follows that the worst-case error of Q L,n in H(K) is

|L( f ) − Q L,n ( f )| q
En (L) = wceH(K) (Q L,n ) = sup |L( f ) − Q L,n ( f )| = sup = K n )TK −1
K L,L − L(K n,n
K n ),
L(K (4)
∥ f ∥K ≤1 f ∈H(K), f ̸=0 ∥ f ∥K
1
Department of Mathematics and Statistics, University of Helsinki, Finland. Email: [email protected].
Karvonen 66

where K L,L = L(K L ) and the function K L is defined via K L (x) = L(K(·, x)). Note that the kernel interpolant and the power
function at x ∈ Ω are recovered from (3) and (4) by selecting the point evaluation functional L = δ x defined as δ x ( f ) = f (x).
That is, Q δ x ,n ( f ) = (I n f )(x) and En (δ x ) = wceK (Q δ x ,n ) = Pn (x) for every x ∈ Ω. For a more thorough review of kernel-based
interpolation and approximation, see [13, 26] and [7, Chapter 8] as well as [12, Chapter 10].
From (2) and (4) it immediately follows that
| f (x) − (I n f )(x)| ≤ ∥ f ∥K Pn (x) and |L( f ) − Q L,n ( f )| ≤ ∥ f ∥K En (L) (EB)
for every f ∈ H(K) and x ∈ Ω. As is well known [6, Section 5.1], these error bounds can be improved to
| f (x) − (I n f )(x)| ≤ ∥ f − I n f ∥K Pn (x) and |L( f ) − Q L,n ( f )| ≤ ∥ f − I n f ∥K En (L) (I-EB)
for every f ∈ H(K) and x ∈ Ω. The first bound in (I-EB) is proved by setting f = f − I n f in (EB) and observing that I n ( f − I n f )
is identically zero because f − I n f vanishes on X n while the second one follows from the same argument combined with
L( f ) − Q L,n ( f ) = L( f − I n f ). Note that ∥ f − I n f ∥K ≤ ∥ f ∥K since f − I n f and I n f are H(K)-orthogonal, which one can prove by
using (1) and the reproducing property. It is a common scenario that, after being provided with an absolute error tolerance ϵ > 0
and a function f , an approximation algorithm Q n ( f ) (e.g., any standard numerical integration routine) proceeds to increase n
until its internal error estimation indicates that |L( f ) − Q n ( f )| ≤ ϵ holds. Due to the presence of a worst-case error which is
available in closed form as long as K L and K L,L can be computed, it would be tempting to use the error bounds (EB) or (I-EB) to
terminate a kernel-based approximation method. However, two problems arise:
(P1) The bounds are worst-case and thus potentially conservative. The bounds in (EB) are clearly sub-optimal for fixed f because
the rates of decay of the right-hand sides do not depend on f . The improved bounds in (I-EB) are obviously better, but it is
still not clear if such worst-case type bounds are optimal in some sense.
(P2) The bounds, even if they are accepted to be useful, are not computable. Although the worst-case error has a simple
linear-algebraic expression given in (2) or (4), computation or estimation of either of the norms ∥ f ∥K or ∥ f − I n f ∥K is not
possible with complete certainty, in the sense that there does not exist a function c : Rn → [0, ∞) satisfying ∥ f ∥K ≤ c( f |X n )
or ∥ f − I n f ∥K ≤ c( f |X n ) for every f ∈ H(K) (see Proposition 3.1). One therefore has to be content with bounds that fail
for some elements of the RKHS. Some discussion of this topic in the context of kernel-based interpolation can be found
in [6, Section 5.1].
The design of termination rules is thus a constant tug of war: a conservative rule (i.e., the error bounds are “large”) may be
comprehensive, in that it terminate early for few elements of the function space of interest but runs the risk of terminating
far too late for most elements, and thus wasting computational resources; an optimistic rule (i.e., the bounds are “small”)
saves computational resources but may terminate early for many elements and thus result in overconfidence in the quality of
approximation.
The purpose of this article is to discuss these two problems, recall certain theoretical results on the relation between the
worst-case and asymptotic settings of error analysis and propose a solution, which by the nature of the task is bound to be to
some extent unsatisfactory and heuristic:
• Section 2 recalls a result by Trojan from [24, Chapter 10] which, in the language of this article, states that the RKHS
contains functions for which |L( f ) − Q L,n ( f )| decays (as n → ∞) with a rate that is arbitrarily close to the rate of decay
of En (L). This provides a form of a resolution, albeit weak, to (P1): it is not a problem to make use of bounds based on
worst-case analysis because, uniformly over f ∈ H(K), the rate of decay of the worst-case error is not slower than that for
individual fixed f . Section 2 owes its existence to the recent work of Owen and Pan [14] who were directed to Trojan’s
result in [24] by Erich Novak (see also [16, Chapter 6]).
• Section 3, which takes its inspiration from recent work on approximation of deterministic functions with Gaussian
processes [8, 25], proposes using maximum likelihood estimation and cross-validation to construct error bounds which,
while optimistic, are “likely” to be valid for “many” elements of the RKHS. The proposed error bounds are c( f , n)Pn (x) and
c( f , n)En (L) with
v v
n
[ f (x i ) − (I n,i f )(x i )]2
u T −1
t f n K n,nf n t1
u X
c( f , n) = cML ( f , n) = or c( f , n) = cCV ( f , n) = , (5)
n n i=1 Pn,i (x i )2

where I n,i and Pn,i stand for the kernel interpolant and the power function based on the points X n \ {x i }. The justification
for the use of the coefficients in (5) is related to the d/2-gap observed by Schaback and Wendland [21] and the supercon-
vergence of kernel-based approximation [20]. A numerical example in Section 4 demonstrates that these error bounds are
not completely without merit. Stopping criteria derived from maximum likelihood estimation and cross-validation have
been recently used in kernel-based integration by Rathinavel and Hickernell [18].
Most theoretical results in this article are well known (and more general, requiring only that the function space be a Hilbert
or Banach space) while those that are new are not particularly difficult to prove. Importantly, our arguments for the use of
the coefficients in (5) are not mathematically rigorous and it is unclear how, and in what sense, they could be made rigorous.
Although we focus on functions in the RKHS, the misspecification results in [1, 8, 11, 28] make some of the discussion applicable
also to functions outside the RKHS if the RKHS is norm-equivalent to a Sobolev space.
Let λ and ν be positive parameters. Throughout the article the stationary Matérn kernel
p p
21−ν 2ν ∥x − y∥ ν 2ν ∥x − y∥
‹ ‹
K(x, y) = Kν for x, y ∈ Rd , (6)
Γ (ν) λ λ

Dolomites Research Notes on Approximation ISSN 2035-6803


Karvonen 67

where Γ is the Gamma function and Kν the modified Bessel function of the second kind of order ν, is used as an example because
its approximation properties are well understood. On any metric space Ω define the fill-distance

hn,Ω = sup min ∥x − x i ∥ .


x∈Ω i=1,...,n

On any sufficiently regular subset Ω of Rd (e.g., Ω = [0, 1]d ) the RKHS of the Matérn kernel (6) is norm-equivalent to the
fractional Sobolev space H ν+d/2 (Ω). The parameter λ only affects the norm-equivalence constants. Assume that Ω is bounded
and let L be the integration functional
Z
L( f ) = f (x)p(x) dx for any p ∈ L ∞ (Ω).

It is well known that there is a constant C > 0 such that


ν+1/2
sup | f (x) − (I n f )(x)| ≤ Chνn,Ω ∥ f ∥H ν+d/2 (Ω) and |L( f ) − Q L,n ( f )| ≤ Chn,Ω ∥ f ∥H ν+d/2 (Ω) (7)
x∈Ω

for every n ∈ N and f ∈ H ν+d/2 (Ω). If the point set {x i }∞


i=1
is quasi-uniform, in that hn,Ω = Θ(n−1/d ), these bounds become

sup | f (x) − (I n f )(x)| ≤ C n−ν/d ∥ f ∥H ν+d/2 (Ω) and |L( f ) − Q L,n ( f )| ≤ C n−ν/d−1/2 ∥ f ∥H ν+d/2 (Ω) (8)
x∈Ω

for a different constant C > 0. In the quasi-uniform case it also holds that

sup Pn (x) = Θ(n−ν/d ) and En (L) = Θ(n−ν/d−1/2 ). (9)


x∈Ω

There is a non-negligible subset of H ν+d/2 (Ω) for which the algebraic rates in (7) and (8) can be essentially doubled [26,
Section 11.5]. Note that the rates above are valid not only for Matérn kernels but for any kernel whose RKHS is norm-equivalent
to a Sobolev space.

2 Asymptotic Setting
The worst-case error En (L) = sup∥ f ∥K ≤1 |L( f ) − Q L,n ( f )| in (4) is by its very nature adversarial: for each n ∈ N there is a fooling
function f n in the unit ball of H(K) for which |L( f n ) − Q L,n ( f n )| = En (L) and, importantly, this function can depend on n. But in
the asymptotic setting [24, Chapter 10] that this article is concerned with there is a single fixed f ∈ H(K) for which the error is to
be estimated. It is not unreasonable to expect that the the worst-case error decays slower than the error for a fixed element of the
RKHS, in that
|L( f ) − Q L,n ( f )|
lim = 0 for every f ∈ H(K), (10)
n→∞ En (L)
provided that the points {x i }∞
i=1
are suitable. This is indeed the case because from (I-EB) one obtains

|L( f ) − Q L,n ( f )| ∥ f − I n f ∥K En (L)


≤ = ∥ f − I n f ∥K (11)
En (L) En (L)
and, as is well known, ∥ f − I n f ∥K tends to zero as n → ∞ if and only if the power function does. We include a proof of this
result for completeness (see, e.g., Theorem 8.37 in [7] for the case of a continuous K).
Proposition 2.1. The following statements are equivalent:
(i) limn→∞ Pn (x) = 0 for every x ∈ Ω.
(ii) limn→∞ ∥ f − I n f ∥K = 0 for every f ∈ H(K).
Moreover, if Ω is a metric space and K : Ω × Ω → R is continuous, then (i) and (ii) are implied by {x i }∞
i=1
being dense in Ω.

Proof. Assume that (i) holds and let f ∈ H(K). From (EB) it follows that

| f (x) − (I n f )(x)| ≤ ∥ f ∥K Pn (x) → 0 as n→∞

for every x ∈ Ω, so that I n f → f pointwise. Because I n+1 f |X n = I n f , it follows from the minimum-norm interpolation property (1)
that (a) the sequence (∥I n f ∥K )∞
n=1
is increasing and (b) ∥I n f ∥K ≤ ∥ f ∥K for every n ∈ N. Therefore there is g ∈ H(K) such that
limn→∞ ∥g − I n f ∥K = 0 since H(K) is a Hilbert space and thus every Cauchy sequence in H(K) tends to an element of H(K).
From the reproducing property and the Cauchy–Schwarz inequality it then follows that
Æ
|g(x) − (I n f )(x)| = |〈g − I n f , K(·, x)〉K | ≤ ∥g − I n f ∥K K(x, x) → 0 as n → ∞

for every x ∈ Ω, so that I n f → g pointwise. Therefore g = f and we conclude that limn→∞ ∥ f − I n f ∥K = 0. That (ii) implies
(i) follows from writing, by using, for example, the reproducing property and the last equality in (2), the power function as
Pn (x) = ∥ f − I n f ∥K for the function f = K(·, x) − I n K(·, x) ∈ H(K).

Dolomites Research Notes on Approximation ISSN 2035-6803


Karvonen 68

Finally, suppose that (Ω, dΩ ) is a metric space and K is continuous. Since f − I n f vanishes on X n and has H(K)-norm at most
∥ f ∥K , it follows from the worst-case characterisation of the power function in (2) that1
 
Pn (x) = sup | f (x) − (I n f )(x)| ≤ sup | f (x)| : ∥ f ∥K ≤ 1 and f |X n = 0 ≤ sup | f (x)| : ∥ f ∥K ≤ 1 and f (x m ) = 0 = P{x m } (x)
∥ f ∥K ≤1

for any m ≤ n, where P{x m } (x) denotes the power function based on a single point. The explicit linear algebraic expression for
the power function in (2) then yields
v
K(x, x m )2
u
Pn (x) ≤ P{x m } (x) = K(x m , x m ) −
t
. (12)
K(x m , x m )

If {x i }∞
i=1
is dense in Ω, for every x ∈ Ω there exists a subsequence (in )∞
n=1
such that in ≤ n and dΩ (x, x in ) → 0 as n → ∞. The
continuity of K and (12) with x m = x in yield the last claim.

However, it is also well known that ∥ f − I n f ∥K can tend to zero arbitrarily slowly, in that for every positive sequence (δn )∞ n=1
tending to zero there exists f ∈ H(K) such that ∥ f − I n f ∥K ≥ δn for every n ∈ N [e.g., 7, Exercise 8.64]. This result (a somewhat
roundabout proof of a version of which is given below) and (11) suggest that the error bounds in (EB) are optimal even in the
asymptotic setting in the sense that the rates of decay of |L( f ) − Q L,n ( f )| and En (L) can be arbitrarily close. That this is indeed
true is confirmed by the following theorem, which is an adaptation to the kernel setting of Theorem 2.1.1 in [24]. This theorem
is apparently originally due to Trojan and was brought to my attention by a recent article on quasi-Monte Carlo integration by
Owen and Pan [14], who in turn were informed about its existence by Erich Novak.
Theorem 2.2 (Trojan; Theorem 2.1.1 in [24]). For any positive sequence (δn )∞ n=1
tending to zero the set
¨ «
|L( f ) − Q L,n ( f )|
A = f ∈ H(K) : lim =0
n→∞ δn En (L)

has empty interior in the norm of H(K). Here we use the convention 0/0 = 1.
Theorem 2.2 states that there are very few functions in H(K) for which the rate of decay of the error, |L( f ) − Q L,n ( f )|, is
faster than that of the worst-case error, En (L). Indeed, by Theorem 2.2 the set
¨ «
|L( f ) − Q L,n ( f )|
A = f ∈ H(K) : lim sup
c
>0 (13)
n→∞ δn En (L)

is dense in H(K)—both in the RKHS norm and the supremum norm since the RKHS norm is stronger of the two—for every
positive sequence (δn )∞
n=1
tending to zero.
Corollary 2.3. For any positive sequence (δn )∞
n=1
tending to zero there is f ∈ H(K) such that
|L( f ) − Q L,n ( f )|
lim sup = ∞,
n→∞ δn En (L)
where the convention 0/(δn × 0) = δn−1 is used.

Proof. Let (δn′ )∞


n=1
be any positive sequence tending to zero. By Theorem 2.2 there is f ∈ H(K) such that
|L( f ) − Q L,n ( f )|
lim rn ( f ) = lim =0
n→∞ n→∞ δn′ En (L)
does not hold. That is, there is f ∈ H(K) such that lim supn→∞ rn ( f ) > 0. Let (δn )∞
n=1
be any positive sequence which tends to
zero and set δn = δn . Then

p

|L( f ) − Q L,n ( f )| |L( f ) − Q L,n ( f )| rn ( f )


lim sup = lim sup p = lim sup p = ∞,
n→∞ δn En (L) n→∞ δn δn′ En (L) n→∞ δn
which proves the claim.

Corollary 2.4. For any non-negative sequence (δn )∞


n=1
tending to zero there is f ∈ H(K) such that ∥ f − I n f ∥K ≥ δn for infinitely
many n ∈ N.

Proof. It is sufficient to consider positive sequences because ∥ f − I n f ∥K ≥ δn holds trivially if δn = 0. By (I-EB) and Corollary 2.3
for any positive decreasing sequence (δn )∞ n=1
tending to zero there is f ∈ H(K) such that

1 |L( f ) − Q L,n ( f )|
∥ f − I n f ∥K ≥ ≥1
δn δn En (L)
for infinitely many n ∈ N. This proves the claim.
1
In fact, the first inequality below is an equality [e.g., 10, Satz 2.2.14].

Dolomites Research Notes on Approximation ISSN 2035-6803


Karvonen 69

Wenzel et al. [27] have essentially proved the above results in a very explicit way for the Wendland kernel

K(x, y) = max{1 − |x − y| , 0}

defined on Ω = [0, 1]. For this kernel the supremum of the power function decays as Θ(n−1/2 ) if the points are quasi-uniform. In
Section 6.2 of [27] it is shown that for function fα (x) = x α with α ∈ (1/2, 1) it holds that

sup | fα (x) − (I n fα )(x)| ≥ Cα n−α


x∈[0,1]

for a positive constant Cα and all n ∈ N if the points are quasi-uniform.

3 Computable Error Bounds


The results in Section 2 demonstrate that there is no gap between the rate of convergence of the worst-case error and the error
for fixed elements of the RKHS, which confirms that (EB) and (I-EB) can be used as a basis of computable error bounds. We
therefore turn our attention to constructing constants c( f , n) ≥ 0, which depends on the values of f at X n , such that

|L( f ) − Q L,n ( f )| ≤ c( f , n) En (L) with “high confidence” for f ∈ H(K). (14)

The meaning of “high confidence” is, of course, bound to be quite heuristic and no attempt at a rigorous definition will be made
in this article. First, one should dispense of any notion that the bound (14) can hold for all f ∈ H(K). We supply a proof in the
RKHS setting of this basic fact of numerical analysis that no error bound that holds for all elements of a sufficiently rich functions
space can be constructed out of partial information.
Proposition 3.1. Let n ∈ N.
(i) There does not exist a function c : Rn → [0, ∞) such that ∥ f − I n f ∥K ≤ c( f |X n ) for every f ∈ H(K).
(ii) Suppose that there is f ∈ H(K) such that f |X n = 0 and L( f ) ̸= 0. Then there does not exist a function ϵ : Rn → [0, ∞) such
that |L( f ) − Q L,n ( f )| ≤ ϵ( f |X n ) for every f ∈ H(K).

Proof. Let f ̸≡ 0 be any function in H(K) that vanishes at X n , the existence of which follows from K being strictly positive-definite
and Ω being an infinite set. For example, one can take f to be the kernel interpolant I n+1 g of any function g such that g|X n = 0
and g(x n+1 ) ̸= 0. Set f a = a f for a > 0, so that ∥ f a − I n f a ∥K = a ∥ f ∥K > 0 but c( f a |X n ) = c( f |X n ) = c(0). Therefore the inequality
∥ f a − I n f a ∥K ≤ c( f a |X n ) is violated for a sufficiently large a, which proves (i). To prove (ii), note that by assumption the function
f ∈ H(K) as above can be selected such that L( f ) ̸= 0. Then |L( f ) − Q L,n ( f )| = |L( f )| > 0, and the rest of the proof is analogous
to that of (i).

The assumption in (ii) of Proposition 3.1 rules out L being a point evaluation functional δ x ( f ) = f (x) for x ∈ X n . To construct
c( f , n) in (14) we use ideas from Gaussian process regression. Sections 3.1 and 3.2 use maximum likelihood estimation and
cross-validation, respectively, of kernel hyperparameters to construct this coefficient. One does not however have to think in
terms of Gaussian processes because, as we show, the coefficients also arise from non-rigorous approximation theoretic reasoning.

3.1 Maximum Likelihood Estimation


It is well known that Gaussian process regression (or kriging) is equivalent to kernel-based approximation, in that the conditional
mean and variance that one obtains after conditioning the Gaussian process prior on point evaluations at X n equal the kernel
quadrature rule and the squared worst-case error [e.g., 23]. In statistics and machine learning, maximum likelihood estimation is a
popular method to select the parameters θ of a parametric kernel Kθ [17, Section 5.4.1] from a set Π of feasible parameters. For
recent examples on the use of maximum likelihood estimation to select the shape parameter λ > 0, as in the Matérn kernel (6),
in radial basis function literature, see [2] and [5, Section 9.4.3].
If f is modelled as a zero-mean Gaussian process with covariance kernel Kθ , the marginal likelihood of the data f n ∈ Rn given
the parameter θ is
1
 ‹
det(2π K θ ,n,n )−1/2 exp − f T K −1
f ,
2 n θ ,n,n
n

where the subscript θ is used to denote the kernel matrix for the parametric kernel Kθ . Maximisation of the marginal likelihood
is equivalent to minimisation of the negative log-likelihood

ℓML (θ ) = f T K −1 f + log det K θ ,n,n .


n θ ,n,n n
(15)

Any maximum likelihood estimate θML of θ , which in general need not be unique, therefore satisfies θML ∈ arg minθ ∈Π ℓML (θ ).
Consider then the parameterisation θ = σ and Kσ (x, y) = σ2 K(x, y) for a scale parameter σ > 0. It is straightforward to compute
that the unique minimiser of (15) is
v
u T −1
t f n K n,nf n ∥I n f ∥K
cML ( f , n) = σML = arg min ℓML (σ) = = p , (16)
σ>0 n n

where the equality f T K −1f = ∥I n f ∥2K follows from (1) and the reproducing property. Behaviour of cML ( f , n) for both functions
n n n
within and without the RKHS has been studied in [8] and [25]. The following proposition collects basic properties of cML ( f , n).

Dolomites Research Notes on Approximation ISSN 2035-6803


Karvonen 70

Proposition 3.2. If f ∈ H(K), then


‚ Œ
1 | f (x m )| ∥ f ∥K
p ≤ cML ( f , n) ≤ p for every m ∈ N and n ≥ m. (17)
K(x m , x m )
p
n n
Moreover, if Ω is a metric space, K is continuous and {x i }∞
i=1
is dense in Ω, then
∥ f ∥K
cML ( f , n) ∼ p as n → ∞, (18)
n
where 0/0 = 1.

Proof. The minimum-norm interpolation property yields


∥I{x m } f ∥K ≤ ∥I X n f ∥K = ∥I n f ∥K ≤ ∥ f ∥K
if x m ∈ X n , which is equivalent to n ≥ m. The upper bound in (17) follows immediately while the lower bound uses
f (x m )2
∥I{x m } f ∥2K = .
K(x m , x m )
The asymptotic equality (18) is a consequence of limn→∞ ∥I n f ∥K = ∥ f ∥K , which follows from Proposition 2.1.

By inserting cML ( f , N ) for c( f , n) in (14) we obtain the computable error estimate


∥I n f ∥K
|L( f ) − Q L,n ( f )| ≤ cML ( f , n) En (L) = p En (L), (19)
n
where the right-hand side is asymptotically n−1/2 ∥ f ∥K En (L) when Ω is a metric space, K is continuous and {x i }∞i=1
is dense in Ω.
From the results in Section 2 it is clear that (19) fails for a large number of elements of H(K). In fact, the error estimate fails for
the dense set Ac in (13). Although somewhat disconcerting, this does not have to mean that (19) fails often for functions which
are encountered in practice. The following approximation theoretic reasoning provides some non-rigorous justification for this
claim. Because I n (I n−1 f ) = I n−1 f , we have from ∥I n f ∥2K = f T K −1 f that
n n,n

∥I n f − I n−1 f ∥2K = ∥I n ( f − I n−1 f )∥2K = a TK −1


n,n
a = (K
K −1 ) [ f (x n ) − (I n−1 f )(x n )]2 ,
n,n n,n

where a = (0, . . . , 0, f (x n ) − (I n−1 f )(x n )) ∈ Rn . From the expression for the power function in (2) and the block matrix inversion
formula we get (K K −1 ) = Pn−1 (x n )−2 . Thus
n,n n,n

f (x n ) − (I n−1 f )(x n ) 2
 ‹
∥I n f − I n−1 f ∥2K = ∥I n ( f − I n−1 f )∥2K = .
Pn−1 (x n )
Repeatedly using ∥I n f ∥2K = ∥I n f − I n−1 f ∥2K + ∥I n−1 f ∥2K and the above equation then yields the well known expression (e.g., [22,
Theorem 6] and [10, Bemerkung 3.1.4])
n n 
f (x i ) − (I i−1 f )(x i ) 2
X X ‹
∥I n f ∥2K = ∥I i f − I i−1 f ∥2K = . (20)
i=1 i=1
Pi−1 (x i )
Therefore
n
f (x i ) − (I i−1 f )(x i )
‹2
1X

cML ( f , n)2 = . (21)
n i=1 Pi−1 (x i )
Because
n 
f (x i ) − (I i−1 f )(x i ) 2
X ‹
∥I n f ∥2K = ≤ ∥ f ∥2K
i=1
P i−1 (x i )
for every n ∈ N if f ∈ H(K) by the minimum-norm interpolation property, the series
∞ ∞ ∞ 
f (x n ) − (I n−1 f )(x n ) 2
X X X ‹
an2 = ∥I n f − I n−1 f ∥2K =
n=1 n=1 n=1
Pn−1 (x n )
converges. Let an be non-negative. Then | f (x n ) − (I n−1 f )(x n )| = an Pn−1 (x n ) for a square-summable sequence (an )∞
n=1
. Supposing
that x n can be replaced with any (or some) x ∈ Ω\X n in this equation—a proposition that we cannot substantiate rigorously—yields
| f (x) − (I n f )(x)| = cn Pn (x) (22)
for a square-summable (cn )∞
n=1
,
which suggests that at least for interpolation the coefficients c( f , n) in the error bound (14) should
form a sequence which either is or, to be on the safe side, “is almost” square-summable. A sequence such that c( f , n) = O(n−1/2 ),
being a prototypical example of a “barely” non-square-summable sequence, is therefore a natural candidate. By Proposition 3.2,
the maximum likelihood estimates cML ( f , n) form such a sequence. Because it is not possible to define any sensible notion of a
boundary between convergent and divergent series and the terms of a convergent series can decay arbitrarily slowly [9, § 41],
the use of c( f , n) = O(n−1/2 ) would not make (14) valid for all f ∈ H(K) even if (22) were true. But one could perhaps argue
that square-summable sequences which are not O(n−1/2 ) are anomalous. In any case, a square-root rate at x ∈ {x i }∞ i=1
can be
proved if it is assumed that the sequence (an )∞n=1
is decreasing.

Dolomites Research Notes on Approximation ISSN 2035-6803


Karvonen 71

Proposition 3.3. If f ∈ H(K) and the sequence defined by an = ∥I n f − I n−1 f ∥K is decreasing, then
| f (x n ) − (I n−1 f )(x n )| 1
 ‹
∥I n f − I n−1 f ∥K = =o p .
Pn−1 (x n ) n
P∞ 2 2
Proof. Above we showed that n=1 ∥I n f − I n−1 f ∥K ≤ ∥ f ∥K < ∞ if f ∈ H(K). Since the non-negative sequence (an )∞ n=1
is
decreasing and square-summable, a general result on series states that nan2 → 0 as n → ∞ [9, p. 124]. This proves the claim.

By the above reasoning and Proposition 3.2, the maximum likelihood estimate (16) can therefore be interpreted as an
approximation of ∥ f ∥K modulated by a factor n−1/2 which ensures that the resulting error bound is not too conservative for those
elements of the RKHS which are “well-behaved”. From Gaussian process perspective there is a very simple explanation for the
presence of n−1/2 : this factor is needed to make the maximum likelihood estimator unbiased. For suppose that f is a zero-mean
Gaussian process with covariance kernel σ02 K for some true scaling σ0 > 0, so that Cov f [ f (x), f ( y)] = E f [ f (x) f ( y)] = σ02 K(x, y).
Then
E f [tr(K f f T )] σ2 K )
– T −1 ™
f n K n,nf n K −1
n,n n n
K −1
tr(K n,n 0 n,n tr(Idn )
E f [σML ] = E f [cML ( f , n) ] = E f
2 2
= = = σ02 = σ02 , (23)
n n n n

which means that σML 2


is an unbiased estimator of σ02 . Equation (23) implies that cML ( f , n)2 is on average Θ(1) for sample
paths of the Gaussian process. In other words, ∥I n f ∥2K = Θ(n) on average. This is related to the well known fact that the
samples of a Gaussian process are not contained in the RKHS of its covariance kernel [4]. In the Matérn case the samples have,
essentially, d/2 less smoothness than the RKHS, which leads one to expect that from the results in [11] it should follow that
∥I n f ∥2K = Θ(h−d
n,Ω
) = Θ(n) for quasi-uniform points on a sufficiently regular Ω ⊂ Rd (see [8, Section 4] for details).
It is worth noting that the reasoning above appears to be very closely related to the d/2-gap observed by Schaback and
Wendland [21] in connection to inverse theorems for kernel-based interpolation. Consider a Matérn kernel (6) of order ν > 0 on
a sufficiently regular and bounded Ω ⊂ Rd and recall that

sup Pn (x) = O(hνn,Ω ) and sup | f (x) − (I n f )(x)| = O(hνn,Ω ) (24)


x∈Ω x∈Ω

for every f ∈ H(K) = H ν+d/2 (Ω). Schaback and Wendland [21, Theorem 6.1] have proved that if f : Ω → R is any function such
that
ν+d/2+ϵ
sup | f (x) − (I n f )(x)| = O(hn,Ω ) (25)
x∈Ω

for every sequence of distinct points {x i }∞


i=1
in Ω and some ϵ > 0, then f ∈ H(K). As is evident, there is a gap of d/2 between
the sufficient and necessary algebraic orders in (24) and (25). When the points are quasi-uniform, the gap is of order n−1/2 and
thus one can think of the factor n−1/2 in the maximum likelihood estimate as a form of a compensation for the lack of this factor
in (24). Note that the proof of Theorem 6.1 in [21] uses the expansion (20) that we used to justify the n−1/2 -factor in cML ( f , n).

3.2 Cross-Validation
In (probabilistic) cross-validation the objective function that the kernel parameters are to minimise is
n
X [ f (x i ) − (I n,i f )(x i )]2 n
X
ℓCV (θ ) = + log Pn,i (x i ), (26)
i=1
Pn,i (x i )2 i=1

where the subscripts denote that the interpolant and the power function are computed using evaluations at X n \ {x i }; see
Section 4.2 in [3] or Section 5.4.1 in [17]. Note that the objective
Pn function (26) differs from the one that is typically used
in kernel-based approximation literature [e.g., 19], ℓ̃CV (θ ) = i=1 ( f (x i ) − I n,i (x i ))2 . Because the kernel interpolant does not
depend on scaling of the kernel, ℓ̃CV cannot be used to select the parameter σ of Kσ (x, y) = σ2 K(x, y).
Similarly to maximum likelihood estimation, it is straightforward to compute that the unique minimiser of (26) under the
scale parametrisation θ = σ is
v
t1
u X n
[ f (x i ) − (I n,i f )(x i )]2
cCV ( f , n) = σCV = arg min ℓCV (σ) = .
σ>0 n i=1 Pn,i (x i )2

Note cCV ( f , n) differs from the form derived for cML ( f , n) in (21) only in which points the kernel interpolant and power function
use in each term of the sum. From the arguments used in the derivation of (21) and the observation that I i−1 f = I i,i f we obtain
n n
1X 1X
cCV ( f , n)2 = ∥I n f − I n,i f ∥2K = ∥I n ( f − I n,i f )∥2K . (27)
n i=1 n i=1

The following proposition collects a few basic properties of cCV ( f , n).


Proposition 3.4. The following statements hold:
(i) cCV ( f , n) > 0 if and only if f (x i ) ̸= 0 for some i ≤ n.
(ii) If f ∈ H(K), then cCV ( f , n) ≤ ∥ f ∥K .

Dolomites Research Notes on Approximation ISSN 2035-6803


Karvonen 72

Proof. Let us first prove (i). It is clear that cCV ( f , n) = 0 if f |X n = 0 because in this case f (x i ) = (I n,i f )(x i ) = 0 for every i ≤ n.
Suppose that f (x i ) ̸= 0 for some i ≤ n. The positivity of cCV ( f , n) is equivalent to f (x j ) ̸= (I n, j f )(x j ) for at least one j ≤ n. Assume
to the contrary that f (x j ) = (I n, j f )(x j ) for every j ≤ n, which means that each I n, j f interpolates f at X n . Therefore I n f = I n, j f
and consequently I n f ∈ span{K(·, x i )}ni=1, i̸= j for every j ≤ n. But because the kernel translates are linearly independent, this
implies that I n f ≡ 0, which contradicts the assumption that f (x i ) ̸= 0 for some i ≤ n. Hence cCV ( f , n) > 0.
Let us then prove (ii). From (27) and the minimum-norm interpolation property it follows that
n n n
1X 1X 1X
cCV ( f , n)2 = ∥I n ( f − I n,i f )∥2K ≤ ∥ f − I n,i f ∥2K ≤ ∥ f ∥2K = ∥ f ∥2K ,
n i=1 n i=1 n i=1
which proves the claim.

The upper bound in (ii) of Proposition 3.4 is clearly extremely conservative. Indeed, precisely the same chain of inequalities
could have been used to show that cML ( f , n) ≤ ∥ f ∥K , which is conservative by a factor of n1/2 . Although we are unable to prove
this, we believe that in most cases it should be expected that
∥I n f ∥K
cCV ( f , n) ≤ cML ( f , n) = p . (28)
n
For example, if Ω is a subset of Rd which equals the closure of its interior and {x i }∞
i=1
is dense in Ω, then
lim ∥I n f ∥K = lim ∥I n,i f ∥K = ∥ f ∥K
n→∞ n→∞

for every i ∈ N if f ∈ H(K) by Proposition 2.1. Therefore


lim ∥I n ( f − I n,i f )∥2K = 0 for every i ∈ N, (29)
n→∞

which in combination with (21) and (27) suggests that (28) ought to hold when n is sufficiently large because each term in the
expansion for cCV ( f , n)2 tends to zero as n increases while the terms in the expansion of cML ( f , n)2 are fixed. However, to make
this argument rigorous the convergence (29) would need to be assumed or proved to be uniform over i ∈ N. What we can prove
is limited to the following much weaker result, which is a generalisation of the claim (ii) in Proposition 3.4,
Proposition 3.5. Suppose that f ∈ H(K) and let (δn )∞ n=1
be any non-negative sequence such that maxi=1,...,n ∥ f − I n,i f ∥K ≤ δn for
every n ∈ N. Then
cCV ( f , n) ≤ δn for every n ∈ N. (30)

Proof. By the minimum-norm interpolation property, ∥I n ( f − I n,i f )∥K ≤ ∥ f − I n,i f ∥K ≤ δn for every i ≤ n. From (27) we thus
obtain
n n
1X 1X 2
cCV ( f , n)2 = ∥I n ( f − I n,i f )∥2K ≤ δ = δn2 .
n i=1 n i=1 n

In certain cases it is possible to obtain the sequence (δn )∞


n=1
in Proposition 3.5 explicitly. Assume that Ω is a compact subset
of Rd and the kernel K is continuous. Then the integral operator T : L 2 (Ω) → L 2 (Ω) defined via
Z
(T f )( y) = f ( y)K(x, y) d y

is compact and self-adjoint. Moreover, the range T (L 2 (Ω)) of T is contained in H(K). One can then show that [26, Section 11.5]
∥ f − I n f ∥K ≤ ∥T −1 f ∥ L 2 (Ω) ∥Pn ∥ L 2 (Ω) if f ∈ T (L 2 (Ω)).
In particular, if K is a Matérn kernel of order ν, Ω is sufficiently regular and the points are quasi-uniform,
∥ f − I n,i f ∥K = O(∥ f − I n f ∥K ) = O(∥Pn ∥ L 2 (Ω) ) = O(n−ν/d )
by (9). That is, in this case Proposition 3.5 gives cCV ( f , n) = O(n−ν/d ) if f ∈ T (L 2 (Ω)). Although the bound (30) is likely to be
somewhat conservative, this nevertheless demonstrates that for certain elements of the RKHS cross-validation may yield less
conservative error bounds than maximum likelihood estimation.
As argued in Section 3.1, the maximum likelihood estimate cML ( f , n) equals an approximation ∥I n f ∥K of ∥ f ∥K modulated
by a factor of n−1/2 . Given (I-EB) and ∥ f − I n f ∥K ≤ ∥ f ∥K , one should obtain a better error bound by approximating ∥ f − I n f ∥K
instead of ∥ f ∥K . However, ∥ f − I n f ∥K cannot be approximated directly as
∥I n ( f − I n f )∥2K ≈ ∥ f − I n f ∥2K
because the left-hand side is always zero due to f − I n f vanishing on X n . But for each i ≤ n, we can use the approximation
∥I n ( f − I n,i f )∥2K ≈ ∥ f − I n,i f ∥2K ≈ ∥ f − I n f ∥2K .
Because not every ∥I n ( f − I n,i f )∥K can be zero by by Proposition 3.4 unless f vanishes on X n , the average of ∥I n ( f − I n,i f )∥2K
would seem to make for a good approximation of ∥ f − I n f ∥2K . As we have seen in (27), this average is precisely cCV ( f , n)2 :
n
1X
cCV ( f , n)2 = ∥I n ( f − I n,i f )∥2K .
n i=1

Dolomites Research Notes on Approximation ISSN 2035-6803


Karvonen 73

3.3 Summary
Let us briefly summarise the findings of this section. The coefficients that we suggest using in the computable error bound (14)
are
v v v v
u T −1 n  n n
t f n K n,nf n ∥I n f ∥K t1
u X
f (x i ) − (I i−1 f )(x i ) 2 t 1 X
‹ u
2 t1
u X
cML ( f , n) = = p = = ∥I i f − I i−1 f ∥K = ∥I i ( f − I i−1 f )∥2K
n n n i=1 Pi−1 (x i ) n i=1 n i=1

and v v v
t1
u X n
[ f (x i ) − (I n,i f )(x i )]2 ut1 X
n
2 t1
u X n
cCV ( f , n) = = ∥I n f − I n,i f ∥ = ∥I n ( f − I n,i f )∥2K .
n i=1 Pn,i (x i )2 n i=1 K
n i=1

The former, cML ( f , n), can be considered a modulated approximation to ∥ f ∥K while the latter, cCV ( f , n), approximates ∥ f − I n f ∥K .
It is therefore to be expected that using cCV ( f , n) results in tighter error bounds.

4 Numerical Example
This section contains a simple numerical study of the computable error bounds proposed in Section 3. Related examples for the
maximum likelihood estimate can be found in [8, Section 5.2]. Stopping criteria derived from maximum likelihood estimation
and cross-validation have been used in numerical integration at lattice points by Rathinavel and Hickernell [18]. We consider the
integration functional
Z1
L( f ) = f (x) dx
0

and the Matérn kernel (6) with ν = 3/2 and λ = 1. For this kernel and integration functional we have
Z1
4 1 p  p  1 p p 
K L (x) = K(x, y) d y = p − exp 3(x − 1) 3 + 2 3 − 3x − exp(− 3x) 3x + 2 3
0 3 3 3

and
1
2 p p p
Z • ˜
K L,L = K L (x) dx = 2 3 − 3 + exp(− 3) 3 + 3 ,
0
3
so that the worst-case error is computable in closed from. We consider the following six test functions of varying smoothness:

f1 (x) = Kν=1,λ=0.5 (x − 0.6) + Kν=1,λ=0.5 (x − 0.2), f2 (x) = Kν=1.6,λ=0.5 (x − 0.6),


f3 (x) = Kν=3.1,λ=0.9 (x − 0.6), f4 (x) = exp(−(x − 0.5)2 ), (31)
f5 (x) = 1 + 0.5x , 2
f6 (x) = Kν=1,λ=0.5 (x − 0.6) + Kν=2,λ=1.2 (x − 0.2).

The subscripts ν and λ for K denote that the smoothness and scale parameters of a Matérn kernel (6). We use four different
sequences of point sets:
1. uniform — endpoint included: Uniform points on [0, 1] with both 0 and 1 included:
1 1
§ ª
X n = 0, ,··· ,1 − ,1 .
n−1 n−1
These point sets are not nested.
2. uniform — endpoint not included: Uniform points on [0, 1] with only 0 included:
1 1
§ ª
X n = 0, , · · · , 1 − .
n n
These point sets are not nested.
3. van der Corput — endpoint included: The first n elements of the van der Corput sequence {1, 0, 0.5, 0.25, 0.75, . . .} with
0 and 1 included. These point sets are nested.
4. van der Corput — endpoint not included: The first n elements of the van der Corput sequence {0, 0.5, 0.25, 0.75, . . .}
with only 0 included. These point sets are nested.
Figures 1 and 2 show the behaviour of the ratios
|L( f ) − Q L,n ( f )| |L( f ) − Q L,n ( f )|
rML (n) = and rCV (n) = (32)
cML ( f , n)En (L) cCV ( f , n)En (L)
for f = f1 , . . . , f6 and n = 1, . . . , 200. It is desirable that the ratios be as close to one (marked by the blue line) as possible. If the
ratios exceed one, the error bounds are optimistic; if not, the bounds are conservative. The following observations can be made
from the figures:

Dolomites Research Notes on Approximation ISSN 2035-6803


Karvonen 74

• Except for very small n, maximum likelihood estimation always yields error bounds that are not optimistic. For uniform
points the bounds appear to become increasingly conservative as n increases.
• When the endpoint is included, cross-validation appears to yield error bounds that are off only by a constant factor.
• However, when the endpoint is not included, cross-validation is optimistic for the test functions f2 , f3 , f4 and f5 . We are
unable to explain this phenomenon, which may be related to these four functions being the smoothest of our six test
functions.
Note the oscillation of the ratios for the least smooth test functions, f1 , f2 and f6 , when uniform points are used. This is likely
caused by the non-nestedness of the uniform point sets due to which the point configurations around the points x = 0.2 and
x = 0.6, at which these functions are not infinitely differentiable, keep changing from one n to another.

Acknowledgements
The author was supported by the Academy of Finland postdoctoral researcher grant #338567 “Scalable, adaptive and reliable
probabilistic integration”. Comments by the reviewers helped to simplify the proof of Proposition 3.1 and made Section 4 much
more interesting than it originally was.

References
[1] Arcangéli, R., de Silanes, M. C. L., and Torrens, J. J. (2007). An extension of a bound for functions in Sobolev spaces, with
applications to (m, s)-spline interpolation and smoothing. Numerische Mathematik, 107(2):181–211.
[2] Cavoretto, R. (2021). Adaptive radial basis function partition of unity interpolation: A bivariate algorithm for unstructured
data. Journal of Scientific Computing, 87(41).
[3] Currin, C., Mitchell, T., Morris, M., and Ylvisaker, D. (1988). A Bayesian approach to the design and analysis of computer
experiments. ORNL-6498, Oak Ridge National Laboratory.
[4] Driscoll, M. F. (1973). The reproducing kernel Hilbert space structure of the sample paths of a Gaussian process. Zeitschrift
für Wahrscheinlichkeitstheorie und Verwandte Gebiete, 26(4):309–316.
[5] Fasshauer, G. and McCourt, M. (2015). Kernel-based Approximation Methods Using MATLAB. Number 19 in Interdisciplinary
Mathematical Sciences. World Scientific Publishing.
[6] Fasshauer, G. E. (2011). Positive definite kernels: Past, present and future. Dolomites Research Notes on Approximation,
4:21–63.
[7] Iske, A. (2018). Approximation Theory and Algorithms for Data Analysis. Springer.
[8] Karvonen, T., Wynne, G., Tronarp, F., Oates, C. J., and Särkkä, S. (2020). Maximum likelihood estimation and uncertainty
quantification for Gaussian process approximation of deterministic functions. SIAM/ASA Journal on Uncertainty Quantification,
8(3):926–958.
[9] Knopp, K. (1951). Theory and Application of Infinite Series. Blackie & Son, 2nd edition.
[10] Müller, S. (2008). Komplexität und Stabilität von kernbasierten Rekonstruktionsmethoden. PhD thesis, University of Göttingen.
[11] Narcowich, F. J., Ward, J. D., and Wendland, H. (2006). Sobolev error estimates and a Bernstein inequality for scattered
data interpolation via radial basis functions. Constructive Approximation, 24(2):175–186.
[12] Novak, E. and Woźniakowski, H. (2010). Tractability of Multivariate Problems. Volume II: Standard Information for Functionals,
volume 12 of EMS Tracts in Mathematics. European Mathematical Society.
[13] Oettershagen, J. (2017). Construction of Optimal Cubature Algorithms with Applications to Econometrics and Uncertainty
Quantification. PhD thesis, Faculty of Mathematics and Natural Sciences, University of Bonn.
[14] Owen, A. B. and Pan, Z. (2022). Where are the logs? arXiv:2110.06420v2.
[15] Paulsen, V. I. and Raghupathi, M. (2016). An Introduction to the Theory of Reproducing Kernel Hilbert Spaces. Number 152 in
Cambridge Studies in Advanced Mathematics. Cambridge University Press.
[16] Plaskota, L. (1996). Noisy Information and Computational Complexity. Cambridge University Press.
[17] Rasmussen, C. E. and Williams, C. K. I. (2006). Gaussian Processes for Machine Learning. Adaptive Computation and
Machine Learning. MIT Press.
[18] Rathinavel, J. and Hickernell, F. J. (2019). Fast automatic Bayesian cubature using lattice sampling. Statistics and Computing,
29(6):1215–1229.
[19] Rippa, S. (1999). An algorithm for selecting a good value for the parameter c in radial basis function interpolation. Advances
in Computational Mathematics, 11(2):193–210.
[20] Schaback, R. (2018). Superconvergence of kernel-based interpolation. Journal of Approximation Theory, 235:1–19.
[21] Schaback, R. and Wendland, H. (2002). Inverse and saturation theorems for radial basis function interpolation. Mathematics
of Computation, 71(238):669–681.

Dolomites Research Notes on Approximation ISSN 2035-6803


Karvonen 75

[22] Schaback, R. and Werner, J. (2006). Linearly constrained reconstruction of functions by kernels with applications to machine
learning. Advances in Computational Mathematics, 25:237.
[23] Scheuerer, M., Schaback, R., and Schlather, M. (2013). Interpolation of spatial data – A stochastic or a deterministic
problem? European Journal of Applied Mathematics, 24(4):601–629.
[24] Traub, J. F., Wasilkowski, G. W., and Woźniakowski, H. (1988). Information-Based Complexity. Computer Science and
Scientific Computing. Academic Press.
[25] Wang, W. (2021). On the inference of applying Gaussian process modeling to a deterministic function. Electronic Journal of
Statistics, 15(2):5014–5066.
[26] Wendland, H. (2005). Scattered Data Approximation. Number 17 in Cambridge Monographs on Applied and Computational
Mathematics. Cambridge University Press.
[27] Wenzel, T., Santin, G., and Haasdonk, B. (2021). Analysis of target data-dependent greedy kernel algorithms: Convergence
rates for f -, f · P- and f /P-greedy. arXiv:2105.07411v1.
[28] Wynne, G., Briol, F.-X., and Girolami, M. (2021). Convergence guarantees for Gaussian process means with misspecified
likelihoods and smoothness. Journal of Machine Learning Research, 22(123):1–40.

Dolomites Research Notes on Approximation ISSN 2035-6803


Karvonen 76

rML (n) — uniform — endpoint included

100

10−1

10−2

10−3
rCV (n) — uniform — endpoint included

100

10−1
f1 f2
10−2 f3 f4
f5 f6

rML (n) — van der Corput — endpoint included

100

10−2

rCV (n) — van der Corput — endpoint included

100

10−1

10−2

20 40 60 80 100 120 140 160 180 200


n
Figure 1: The ratios rML (n) and rCV (n) in (32) for the functions f = f1 , . . . , f6 in (31) and n = 1, . . . , 200 when the point sets X n include the
endpoint.

Dolomites Research Notes on Approximation ISSN 2035-6803


Karvonen 77

rML (n) — uniform — endpoint not included

100

10−1

10−2

rCV (n) — uniform — endpoint not included

100

10−2 f1 f2
f3 f4
f5 f6
10−4
rML (n) — van der Corput — endpoint not included

100

10−1

10−2

10−3
rCV (n) — van der Corput — endpoint not included
101

100

10−1

10−2
20 40 60 80 100 120 140 160 180 200
n
Figure 2: The ratios rML (n) and rCV (n) in (32) for the functions f = f1 , . . . , f6 in (31) and n = 1, . . . , 200 when the point sets X n do not include
the endpoint.

Dolomites Research Notes on Approximation ISSN 2035-6803

You might also like