0% found this document useful (0 votes)

2 views20 pages

Instance Optimal Function Recovery

This paper investigates non-linear sampling recovery of multivariate functions using compressed sensing techniques, specifically focusing on square root Lasso (rLasso) and orthogonal matching pursuit (OMP) as effective noise-blind decoders. The authors demonstrate that these methods outperform traditional linear recovery techniques, such as least squares and Smolyak's algorithm, particularly in certain parameter settings. The findings highlight the potential for improved recovery guarantees in multivariate function approximation through the use of random samples and non-linear methods.

Uploaded by

Dan Paul

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views20 pages

Instance Optimal Function Recovery

Uploaded by

Dan Paul

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Instance optimal function recovery – samples, decoders and

asymptotic performance
Moritz Moeller a , Kateryna Pozharska a,b , Tino Ullrich a,∗
a
Chemnitz University of Technology, Faculty of Mathematics
arXiv:2503.16209v1 [math.NA] 20 Mar 2025

b
Institute of Mathematics of NAS of Ukraine

March 21, 2025

Abstract
In this paper we study non-linear sampling recovery of multivariate functions using
techniques from compressed sensing. In the first part of the paper we prove that square
root Lasso (rLasso) with a particular choice of the regularization parameter λ > 0 as well
as orthogonal matching pursuit (OMP) after sufficiently many iterations provide noise blind
decoders which efficiently recover multivariate functions from random samples. In contrast
to basis pursuit the decoders (rLasso) and (OMP) do not require any additional information
on the width of the function class in L∞ and lead to instance optimal recovery guarantees.
In the second part of the paper we relate the findings to linear recovery methods such as least
squares (Lsqr) or Smolyak’s algorithm (Smolyak) and compare the performance in a model
situation, namely periodic multivariate functions with Lp -bounded mixed derivative will be
approximated in Lq . The main observation is the fact, that (rLasso) and (OMP) outperform
Smolyak’s algorithm (sparse grids) in various situations, where 1 < p < 2 ≤ q < ∞. For
q = 2 they even outperform any linear method including (Lsqr) in combination with
recently proposed subsampled random points.

1 Introduction
This paper can be seen as a continuation of Jahn, T. Ullrich, Voigtlaender [18]. Here we
aim for a certain type of instance optimality when recovering a multivariate function f from
samples. The term instance optimality was coinded by Cohen, Dahmen, DeVore [10]. Here
we use it in the context of function recovery from samples (decoding) and refer to an error
guarantee of type (1.3) and (1.4) which holds true for any instance f . A particular focus is put
on non-linear recovery methods (decoders) such as square root Lasso (rLasso), see Definition
3.1, and orthogonal matching pursuit (OMP), see Definition 3.2. The variants discussed here
use function samples at random points and provide recovery guarantees with high probability.
Square root Lasso (Least Absolute Shrinkage and Selection Operator) has been introduced by
Belloni, Chernozhukov and Wang [4], analyzed by H. Petersen, P. Jung [33] and already used for
function recovery in high dimensions by Adcock, Bao, Brugiapaglia [1]. The decoder (rLasso)
Keywords and phrases: multivariate approximation; best m-term approximation; uniform norm; rate of
convergence; sampling recovery.
2020 Mathematics subject classification: 42A10, 94A20, 41A46, 46E15, 42B35, 41A25, 41A17, 41A63
∗
Corresponding author, Email: [email protected]

1
turns out to be noise blind and does not require any further information of the functions class
where the function f belongs to. This is in contrast to the recently proposed variant of basis
pursuit denoising investigated by the third named author together with Jahn and Voigtlaender
[18], where we used certain widths in L∞ as a parameter for the ℓ1 -minimization decoder.
We propose to use the recovery operator Rm,λ (·; X) based on the optimization program
(rLasso)
min ∥z∥ℓ1 (N ) + λ∥Az − y∥ℓ2 (m) , (1.1)
z∈CN
√
where we choose λ = κ · n. It will be a universal algorithm allowing for individual estimates
on the respective d-variate periodic function f ∈ C(Td ) of interest. The vector z ∈ CN will
later represent the coefficients in an appropriate basis expansion. To be more precise, for

m ≥ α · d · n · log2 (n + 1) · log M (1.2)

many random samples X = {x1 , ..., xm } it holds for 2 ≤ q ≤ ∞ and all f ∈ C(Td ) that

∥f − Rm,κ√n (f ; X)∥Lq ≤ Cn1/2−1/q σn (f ; T d )L∞ + E[−M,M ]d ∩Zd (f ; T d )L∞ (1.3)

with high probability. See Section 2 for the notational conventions. A function is recovered from
the vector y = f (X) := (f (x1 ), ..., f (xm ))T ∈ Cm of point evaluations at random nodes where
the set of nodes is fixed in advance and is used for all functions f simultaneously. The task
is to solve the (rLasso) optimization program (1.1) with respect to a randomly subsampled
Fourier matrix A for the coefficient vector z of the approximant Rm,κ√n (f ; X). Practical
considerations like well-posedness, stability, etc. for (rLasso) have been published recently by
Berk, Brugiapaglia and Hoheisel [5].
The situation is completely parallel for the greedy algorithm (OMP), see Definition 3.2 be-
low. The recovery guarantee (1.4) is an easy consequence of known results from compressed
sensing, see Foucart, Rauhut [16, Theorem 6.25] based on Zhang [40], Haviv, Regev [17] and
Brugiapaglia, Dirksen, Jung, Rauhut [7], together with the approach from [18]. The iterative
procedure used here is denoted with Pm,k indicating the k greedy steps and the m used samples.
The additional key feature is the fact that the approximant is always k-sparse. This property
is not present for (rLasso). The corresponding version of (1.3) is

∥f − Pm,24n (f ; X)∥Lq ≤ Cn1/2−1/q σn (f ; T d )L∞ + E[−M,M ]d ∩Zd (f ; T d )L∞ (1.4)

provided that (1.2) holds for the number of samples. Note, that recently Dai, Temlyakov [14]
considered weak orthogonal matching pursuit which involves an additional weakness parameter
in the greedy selection step. Remarkably, they obtained a similar control of the greedy ap-
proximation error in terms of the best n-term approximation which leads to analogous results.
Note, that there are various numerical implementations of (OMP), see, e.g., Kunis, Rauhut [24].
Their implementation is based on non-equispaced fast Fourier transform NFFT.
We put our findings into perspective to other contemporary sampling recovery methods such
as sparse grids (Smolyak) and linear methods based on least squares with respect to hyperbolic
crosses on subsampled random points (Lsqr), see Figure 2. Our results are collected in Figure
1 below which illustrates the regions in the (1/p, 1/q) parameter domain for our model scenario
on the d-torus, namely spaces with Lp -bounded mixed derivative, denoted with Wpr , where the
error is measured in Lq . The different methods are known to be optimal, close to optimal or
at least superior over others. As optimality measure we use the classical notion of sampling
numbers introduced in (2.1) below. The picture is only partially complete which in turn means
that there are a lot of open problems, where the reader is invited to contribute.

2
We consider mixed Wiener spaces Armix on the d-torus and function classes with bounded
mixed derivative Wpr as surveyed in Dũng, Temlyakov, T. Ullrich [12, Chapt. 2]. These spaces
have a relevant history in the former Soviet Union and serve as a powerful model for multivariate
approximation. Concretely, we study the situation Wpr in Lq where 1 < p ≤ 2 ≤ q and the case
of small smoothness 2 < p < ∞ and 1/p < r ≤ 1/2. We consider the worst-case setting where
the error is measured in Lq . It turned out in [18], see also Moeller, Stasyuk and the third named
author [28], that for several classical smoothness spaces non-linear recovery in L2 outperforms
any linear method (not only sampling). The results in this paper show that this effect partially
extends to Lq with 2 ≤ q < ∞. In fact, functions in mixed weighted Wiener spaces Armix
provide an intrinsic sparsity with respect to the trigonometric system such that the additional
gain in the rate does not seem to be a surprise. If r > 1/2 it holds for m ≳ Cr,d n log3 (n + 1)
that there is a non-linear recovery map Am based on (OMP) or (rLasso) using random points,
such that
sup ∥f − Am (f )∥Lq ≲ n−(r+1/q) (log(n + 1))(d−1)r+1/2
∥f ∥Ar ≤1
mix

with high probability. We determine a polynomial rate of convergence r + 1/q which is at least
sharp in the main rate (apart from logarithms) and outperforms any linear algorithm. The
situation is not so clear when studying Wpr classes in Lq . Surprisingly, in case 1 < p < 2 < q
and 1/p+1/q > 1 square root Lasso and orthogonal matching pursuit outperform any sampling
algorithm based upon sparse grids if d is large. The acceleration only happens in the logarithmic
term. We obtain for r > 1/p and m ≳ Cr,d n log3 (n + 1) a non-linear recovery map based on
(rLasso) and (OMP) using random samples such that
−(r− p1 + 1q ) (d−1)(r−2( p1 − 12 ))+ 12
sup ∥f − Am (f )∥Lq ≲ n (log(n + 1))
∥f ∥Wpr ≤1

with high probability. The result shows that for q = 2 and d large (rLasso) and (OMP) have
a faster asymptotic decay than any linear method and in particular (Lsqr). This effect has
been observed already for basis pursuit denosing in [18]. Note, that the described effects do
not appear when it comes to the uniform norm, i.e., q = ∞. This is a consequence of a general
result, described in Novak, Wozniakowski [32, Chapter 4.2.2], see also Remark 3.12.
The bound in (1.3) has the striking advantage that one may directly insert known bounds
from the literature, see [3, 38] and [12, Section 7] for an overview. Other approaches, like in
[20], require the embedding of the function class into the multivariate Wiener algebra A, which
1/2
is not always the case, not even for classical smoothness spaces like Sobolev spaces Wp for
p > 2. This non-trivial fact sharpens Bernstein’s classical result on the absolute convergence of
Fourier series from functions in Hölder-Zygmund spaces, see Zygmund [41, Theorem VI.3.1,page
240] and the references therein, and will be proven in a forthcoming paper by the authors.
Smolyak’s sparse grids [34] in connection with functions providing bounded mixed derivative
or difference have a significant history not only for approximation theory, see [35], [12] and the
references therein, but also in scientific computing, see Bungartz, Griebel [8]. The underlying
spaces do not only serve as a powerful model for multivariate approximation theory motivated
from practical problems, also sparse grid algorithms allow for good (and sometimes optimal)
approximation rates with significantly fewer sampling points. It is strongly related to hyperbolic
cross approximation. In Figure 1 we indicate the parameter regions, where (Smolyak) is known
to be optimal with respect to Gelfand/approximation numbers.
Finally, we would like to mention the recent developments in direction of least squares
methods (Lsqr). Beginning from the breakthrough result by Krieg, M. Ullrich [23], where it
was shown that sampling recovery for reproducing kernel Hilbert spaces in L2 is asymptotically

3
equally powerful as linear approximation, authors improved both, algorithms [29, 2] and error
guarantees [25], until the remaining logarithmical gap has been finally sealed by M. Dolbeault,
D. Krieg, M. Ullrich [15] for RKHS which are sufficiently compact in L2 . In Nagel, Schäfer, T.
Ullrich [29] subsampled random points appeared for the first time. The final solution [15] is
again heavily based on the solution of the Kadison-Singer problem [26], however, highly non-
constructive. As for the classical problem W2r in L2 (the midpoint in Figure 1) the algorithm
uses the basis functions from the hyperbolic cross (left most picture in Figure 2) with n fre-
quencies. The nodes on the spatial side result from a random draw (O(n log n)) together with
a subsampling to |X| = O(n) points (fourth picture in Figure 2). The resulting overdetermined
matrix is then used to recover the coefficients from the sample vector y = f (X) via a weighted
least squares algorithm. Apart from the Hilbert space setting, the situation Wpr in Lq has been
investigated in Krieg, Pozharska, M. Ullrich, T. Ullrich [21, 22].

1
q ϱn (Wpr , Lq )
Notation For a number a, by a+ we de-
note max{a, 0}, and by log(a) its natural log- 1
n
arithm. C shall denote the complex n-space
and Cm×n the set of complex m × n-matrices. Lsqr, rLasso
Vectors and matrices are usually typesetted OMP
N Smolyak
boldface. For a vector v ∈ C and a set (Remark 4.13)
(Remark 4.15)
S ⊂ {1, ..., N } we mean by vS ∈ CN the
1
restriction of v to S, where all other en- 2
tries ˆ
are set to zero. We denote by f (k) = rLasso, OMP
R (Rem. 4.8, 4.9)
Td f (x) exp(−2πik · x) dx the Fourier coeffi-
cient with respect to the frequency k ∈ Zd
and indicate by f ∈ T ([−M, M ]d ) that f is Smolyak
a trigonometric polynomial with support on (Remark 4.14)
the frequencies in the set [−M, M ]d ∩ Zd . The 1
p
notation Lq := Lq (Td ), 1 ≤ q < ∞, indi- 1 1
2
cates the classical Lebesgue space of periodic
functions on the d-torus Td = [0, 1]d with the
Figure 1: Magenta area: Comparison to Smolyak,
usual modifications for q = ∞. The notation optimality not clear. Orange area: Optimality:
C(Td ) stands for the space of continuous pe- Gelfand widths. Green area: Optimality: linear
riodic functions on Td with the sup-norm. All widths.
other function spaces of d-dimensional func-
tions will be typesetted boldface. With ℓp (N )
we denote the CN quasi-normed by ∥x∥p := ( N p 1/p . Let X and Y are two normed
P
k=1 |xk | )
spaces. The norm of an element x in X will be denoted by ∥x∥X . The symbol X ,→ Y indicates
that the identity operator from X to Y is continuous. For two sequences an and bn we will
write an ≲ bn if there exists a constant c > 0 such that an ≤ c bn for all n ∈ N. We will write
an ≍ bn if an ≲ bn and bn ≲ an . The involved constants do not depend on n but may depend
on other parameters.

2 Best n-term and linear approximation

Let Ω denote a compact topological space and C(Ω) the set of complex-valued continuous
functions on Ω. The (non-linear) sampling widths for a quasi-normed space F of functions

4
Hyperolic cross Sparse grid Full grid Random+subsampling Random points

Dim = 256 m = 256 m = 4225 m = 384 m = 384

Figure 2: Hyperbolic cross in the frequency domain [−32, 32]2 ∩ Z2 , different sampling designs
in d = 2

f : Ω → C, which is continuously embedded into Y , are defined as follows. For m > 1 define

ϱm (F)Y := inf inf sup ∥f − R (f (X)) ∥Y , (2.1)

X={x1 ,...,xm ) R : Cm →Y ∥f ∥F ≤1

where R : Cm → Y denotes an arbitrary (not necessarily linear) reconstruction map. This

quantity is lower bounded by the Gelfand width cm (F)Y defined as

cm (F)Y = inf sup ∥f − R ◦ L(f )∥Y .

R : Cm →Y ∥f ∥ ≤1
F
L∈L(F,Cm )

If one restricts to linear recovery operators R : Cm → Y , then the corresponding quantities are
denoted by ϱlinm (F)Y and λm (F)Y . In other words, we look for optimal linear operators with
rank not exceeding m, i.e.,

λm (F)Y := inf sup f − Tf Y

.
T : F→Y ∥f ∥ ≤1
F
rank(T ) ≤ m

It is well known that linear algorithms are optimal if Y = L∞ (see Novak, Wozniakowski [32,
4.2.2]) and Remark 3.12.
Let I denote a countable index set and B = {bk ∈ C(Ω) : k ∈ I} a dictionary consisting of
continuous functions. Note that often the additional requirement is needed that the functions
in B are universally bounded in L∞ . For n ∈ N, we define the set of linear combinations of n
elements of B as  
X 
J
Σn := cj bj (·) : J ⊂ I, |J| ≤ n, (cj )j∈J ∈ C .
 
j∈J

Furthermore, given J ⊂ I we denote the linear span of (bj (x))j∈J by

VJ := spanC {bk (·) : k ∈ J}.

Note that the set Σn is “non-linear” (not a vector space), whereas the space VJ is linear. When
dealing with Y = Lq (Ω, µ), 1 ≤ q ≤ ∞, for a Borel measure µ on Ω it is often desirable that
the bk (·) are pairwise orthogonal with respect to µ. We denote by

σn (f ; B)Y := inf ∥f − g∥Y

g∈Σn

the best n-term approximation error for f and by

σn (F; B)Y := sup σn (f ; B)Y

∥f ∥F ≤1

5
the corresponding width with respect to F. Let further

EJ (f ; B)Y := inf ∥f − g∥Y

g∈VJ

denote the linear best approximation error for f as well as for the entire class F

EJ (F; B)Y := sup EJ (f ; B)Y .

∥f ∥F ≤1

2.1 Trigonometric systems, Fourier matrices and de la Valleé Poussin means

In what follows, we formulate the results only for the case of the multivariate trigonometric
system
B = T d = {exp(2πik·) : k ∈ Zd }
defined on the torus Ω = Td = [0, 1]d . Next, will define so-called Fourier matrices, i.e., matrices
occurring from complex exponentials (Fourier monomials) evaluated at certain points. Let
D ∈ N and k ∈ J := [−D, D]d ∩ Zd . Let further
n n o
G(D, d) := : n ∈ {0, . . . , 2D}d
2D
denote the d-dimensional equidistant grid in Td , see Figure 2. We will use the following random
P G(D, d). Draw m points uniformly i.i.d. w.r.t. the discrete
model for subsampling the full grid
uniform distribution µG = 1/|G| x∈G δx . Denote these points with X = {xℓ : ℓ = 1, ..., m} .
Now we consider the matrix
1
A = (aℓ,k )1≤ℓ≤m,k∈J := √ exp(2πik · xℓ ) .
m ℓ,k

We may use an enumeration of [−D, D]d = {k1 , ..., kN } ⊂ Zd with N = (2D + 1)d and define
the enumerated multivariate Fourier system as ej (·) := exp(2πikj ·), j = 1, ..., N . We will write
√
A = 1/ m(ej (xℓ )1≤ℓ≤m,1≤j≤N .
The following specifically tailored version of the multivariate de la Valleé Poussin mean will
be of use. In our special setting the operator VM takes the place of the quasi-projection P
which has been used in [18, Sect. 3.1]. The key features of the construction below are that
the operator is the identity on T ([−M, M ]d ) and the fact that it has a universally bounded
operator norm from L∞ to L∞ with respect to M and d. Indeed, from [18, Sect. 3.1] we obtain
1 d
∥VM ∥L∞ →L∞ ≤ 1 + ≤ e, (2.2)
d
for the operator defined by
X
VM (f )(x) = fˆ(k) vk exp(2πik · x), (2.3)
k∈Zd
Qd
with weights vk = j=1 vkj satisfying

1,
 |kj | ≤ M,
(2d+1)M −|kj |
vkj = 2dM , M < |kj | ≤ (2d + 1)M, (2.4)

0, |kj | > (2d + 1)M .


6
3 Instance optimal function recovery – guarantees
We will consider two different nonlinear decoders, square root Lasso (rLasso) and orthogonal
matching pursuit (OMP). As for the first one see H. Petersen, Jung [33] and the references therein.
The advantage of (rLasso) over basis pursuit denoising as used in [18] is the “noise blindness”
which results in the advantage that we do not have to incorporate additional information from
the function class f belongs to. This feature is also present for greedy methods such as (OMP),
see Foucart, Rauhut [16, 6.4] or Dai, Temlyakov [14, Paragraph after Cor. 1.2].
We will tailor square root Lasso and orthogonal matching pursuit to the function recovery
problem. For the general scenario described above, the decoder maps Rm,λ : C(Ω) → C(Ω)
and Pm,k : C(Ω) → C(Ω) are chosen in the following way. We fix a finite index set J and
X = {x1 , . . . , xm } ⊂ Ω.
√
A := 1/ m(bj (xℓ ))1≤ℓ≤m,j∈J ∈ Cm×|J| (3.1)
√
and y = f (X)/ m ∈ Cm .
Definition 3.1 (rLasso). Let λ > 0 and m ∈ N. Put
X
Rm,λ (f ; X) := (c# (y))j bj (·) ∈ VJ ⊂ L∞ , (3.2)
j∈J

where c# (y) ∈ C|J| is any (fixed) solution of the square root Lasso minimization problem

min ∥z∥ℓ1 (|J|) + λ∥Az − y∥ℓ2 (m) (3.3)

z∈C|J|

with respect to the matrix (3.1) and the vector of samples y ∈ Cm . This defines a (not neces-
sarily linear) map Rm,λ : C(Ω) → C(Ω). The parameter λ > 0 is chosen below and may depend
on other parameters.
Definition 3.2 (OMP). Let k ∈ N, J, A, X and y as above. Then
X
Pm,k (f ; X) := (ck (y))j bj (·) ∈ VJ ⊂ L∞ , (3.4)
j∈J

where with S 0 := ∅, c0 := 0 ∈ C|J| and for l = 1, ..., k repeat

S l+1 = S l ∪ argmax |(A∗ (y − Acl ))n | : n ∈ {1, . . . , |J|} ,

(3.5)
cl+1 = argmin ∥y − Ac∥ℓ2 (m) : supp(c) ⊂ S l+1 .

(3.6)

3.1 Analysis of square root Lasso (rLasso)

We will prove the following statement which combines the robust recovery guarantee from H.
Petersen and P. Jung [33, Theorem 3.1] using square root Lasso with the fact that RIP matrices
of order 2n with sufficiently small RIP constant δ2n < 1/3 provide the ℓ2 -robust null spaces
property of order n, see Theorem 3.6 below. The improved RIP result below keeps valid for
general bounded orthonormal systems as shown in Brugiapaglia, Dirksen, H.C. Jung, Rauhut
[7], such that our results may be transferred to a more general setting. However, here we are
specifically interested in the multivariate Fourier system (see Section 2.1), which is why we
rely on the result by Bourgain [6] and Haviv and Regev [17], see also [18, Theorem 2.16] for
a discussion on the multivariate aspect. Let us start with the notion of ℓq -robust null space
property.

7
Definition 3.3 (ℓq -robust null space property). Given 1 ≤ q < ∞, m, N ∈ N and ∥ · ∥ a norm
on Cm , the matrix A ∈ Cm×N satifies the ℓq -robust null space property of order n < N if there
exist constants 0 < ϱ < 1 and τ > 0 such that for all v ∈ CN and all S ⊂ [N ] with |S| ≤ n

∥vS ∥ℓq ≤ ϱn1/q−1 ∥vS c ∥1 + τ ∥A · v∥ .

We will use this property in the following proposition which is a direct consequence of H.
Petersen and P. Jung [33, Theorem 3.1].
Proposition 3.4. Let A ∈ Cm×N be a matrix satisfying the ℓ2 -robust null space of order n in
the form
∥vS ∥ℓ2 ≤ ϱn−1/2 ∥vS c ∥1 + τ ∥A · v∥2 .
Then there is constant κ > 0 (depending only on τ ) such that for any y ∈ Cm and c ∈ CN a
solution c# ∈ CN of the (rLasso) minimization problem
√
min ∥z∥ℓ1 (N ) + κ n∥Az − y∥ℓ2 (m) (3.7)
z∈CN

satisfies √
∥c − c# ∥ℓ1 ≤ βσn (c)ℓ1 + δ n · ∥Ac − y∥ℓ2 (3.8)
and
σn (c)ℓ1
∥c − c# ∥ℓ2 ≤ β √ + δ · ∥Ac − y∥ℓ2 , (3.9)
n
where
σn (c)ℓ1 := inf ∥c − z∥ℓ1 ,
z∈CN ,∥z∥ℓ0 ≤n

with ∥z∥ℓ0 := |{1 ≤ j ≤ N : zj ̸= 0}|. The constants β, δ > 0 only depend on ϱ and τ .
√
Proof. Theorem [33, Theorem 3.1] says we may choose λ > τ n in the optimization program

min ∥z∥ℓ1 (N ) + λ∥Az − y∥ℓ2 (m)

z∈CN

to obtain (3.9). Clearly, the ℓ2 -robust null space property w.r.t. the Euclidean norm ∥·∥2 implies
the ℓ1 -robust null space property with modified norm ∥ · ∥ = n1/2 ∥ · ∥ℓ2 (m) . Again Theorem [33,
Theorem 3.1] says that we may choose λ > τ in the modified optimization problem

min ∥z∥ℓ1 (N ) + λ∥Az − y∥

z∈CN

to obtain
√
∥c − c# ∥ℓ1 ≤ βσn (c)ℓ1 + δ · ∥Ac − y∥ = βσn (c)ℓ1 + δ n · ∥Ac − y∥ℓ2 ,

which is (3.8). Hence, (3.7) works for q = 1 and q = 2 simultaneously and yields the bounds
(3.9) and (3.8).

Theorem 3.5. There exist universal constants α, β, γ, δ, κ > 0 such that the following holds
true. Let D ∈ N, N = (2D + 1)d and n, m ∈ N satisfy

m ≥ α · d · n · log2 (n + 1) · log(D + 1). (3.10)

√ iid
Put A = 1/ m(ej (xℓ ))1≤ℓ≤m,1≤j≤N for x1 , . . . , xm ∼ µG the subsampled Fourier matrix.
Then with probability at least 1 − N −γ log (n+1) with respect to the choice of x1 , . . . , xm the

8
following holds: Given c ∈ CN and y ∈ Cm , and a solution c# ∈ CN of the (rLasso) mini-
mization problem √
min ∥z∥ℓ1 (N ) + κ n∥Az − y∥ℓ2 (m) (3.11)
z∈CN

then √
∥c − c# ∥ℓ1 ≤ βσn (c)ℓ1 + δ n · ∥Ac − y∥ℓ2 (3.12)
and
σn (c)ℓ1
∥c − c# ∥ℓ2 ≤ β √ + δ · ∥Ac − y∥ℓ2 . (3.13)
n

Note, that since N ≥ 2, the number 1 − N −γ log(n+1) and therefore also the probability of
choosing a vector of “good” sampling points X = {x1 , . . . , xm } is close to 1.

Proof. Choosing α large enough in (3.10) ensures that A has RIP of order 2n with RIP constant
δ2n < 1/3 with the mentioned probability, see [17, Theorem 3.7]. In fact, it holds for all c ∈ CN
with ∥c∥0 ≤ 2n that
(1 − δ2n )∥c∥22 ≤ ∥A · c∥22 ≤ (1 + δ2n )∥c∥22 . (3.14)
By Theorem 3.6 below we have that A then provides ℓ2 -robust null space property (NSP) of
order n with constants ϱ, τ depending only on δ2n < 1/3. Finally, we apply Proposition 3.4
and conclude the proof.

Theorem 3.6 (RIP implies robust NSP). For A ∈ Cm×N assume that A satisfies RIP with
δ2n < 13 , see (3.14). Then A satisfies the ℓ2 -robust null space property (NSP) of order n, i.e.
ρ
∥vS ∥2 ≤ √ ∥vS C ∥1 + τ ∥Av∥2 ∀v ∈ CN , ∀S ⊂ [N ], S ≤ n, (3.15)
n

where the constants ρ ∈ (0, 1), τ > 0 depend only on δ2n .

Proof. Let v ∈ CN . For v ∈ ker A \ {0}, it is enough to consider S = Jn (v) (index set of largest
entries of v in absolute value). We partition [N ] into the index sets

S0 := S = Jn (v)
S1 := J2n (v) \ Jn (v)
S2 := J3n (v) \ J2n (v)
..
.

and note that it is enough to show the ℓ2 robust NSP for S0 = S = Jn (v). We estimate as
follows.
1 1 D X E
∥vS ∥22 ≤ ∥AvS ∥22 = AvS0 , Av − AvSk
1 − δn 1 − δn
k≥1
1 X
≤ |⟨AvS0 , Av⟩| + |⟨AvS0 , AvSk ⟩|
1 − δn
k≥1
1 X
≤ ∥AvS0 ∥2 ∥Av∥2 + δ2n ∥vS0 ∥2 ∥vSk ∥2
1 − δn
k≥1
1 p 1 X
≤ δn + 1∥vS0 ∥2 ∥Av∥2 + δ2n ∥vS0 ∥2 · √ ∥vSk−1 ∥1 .
1 − δn n
k≥0

9
After division by ∥vS ∥2 and Hölder’s inequality, this yields
√
1 + δ2n δ2n 1
∥vS ∥2 ≤ ∥Av∥2 + √ ∥vS ∥1 + ∥vS C ∥1 .
1 − δ2n 1 − δ2n n

We can rearrange this to

√
δ2n −1 δ2n 1 δ2n −1 1 + δ2n
∥vS ∥2 ≤ 1 − √ ∥v C ∥1 + 1 − ∥Av∥2 .
1 − δ2n 1 − δ2n n S 1 − δ2n 1 − δ2n

We set ρ and τ accordingly and get the assertion.

Theorem 3.7. There exist universal constants C, α, κ, γ > 0 such that the following holds for
M, n ∈ N and put D := (2d + 1)M . Drawing at least

m := ⌈α · d · n · log2 (n + 1) · log(D + 1)⌉

iid
sampling points X = {x1 , . . . , xm } ∼ µG , i.i.d. from the uniform measure on the grid it holds
with probability at least 1 − N −γ log(n+1) for 2 ≤ q ≤ ∞ that for any f ∈ C(Td )

∥f − Rm,κ√n (f ; X)∥Lq ≤ Cn1/2−1/q · σn (f ; T d )L∞ + E[−M,M ]d ∩Zd (f ; T d )L∞ ,

where Rm,κ√n denotes (rLasso) decoder from Definition 3.1 such that the approximant is con-
tained in the space of trigonometric polynomials T ([−(2d + 1)M, (2d + 1)M ]d ).

Proof. To prove Theorem 3.7 for 2 ≤ q ≤ ∞ we first get the L∞ -bound for the worst-case error
and combine it via interpolation with the L2 -bound.
For the L∞ -bound we will use the control over ∥c − c# ∥ℓ1 in Theorem 3.5, whereas the
control on ∥c − c# ∥ℓ2 serves for the L2 bound. Let ε > 0. Take an arbitrary f ∈ C(Td ) and
let f ∗ = VM s, for s such that ∥f − s∥L∞ ≤ σn (f ; T d )L∞ + ε. The coefficient vector c of f ∗ is
√ √
n-sparse. We also set y = f (X)/ m and e = (f (X) − f ∗ (X))/ m. Hence ∥A · c − y∥2 =
∥e∥ℓ2 ≤ ∥f (X) − f ∗ (X)∥ℓ∞ . Then, taking into account the boundedness of the Fourier system
(see Section , we have from Theorem 3.5
N
(cj − c#
X
∥f ∗ − Rm,λ f ; X)∥L∞ ≤ #
j (y)) ∥ej (·)∥L∞ ≤ ∥c − c ∥ℓ1
j=0
√ (3.16)
≤ βσn (c)1 + δ · n∥A · c − y∥ℓ2
√
≤ δ · n∥f (X) − f ∗ (X)∥ℓ∞ .

Note, that ∥f (X) − f ∗ (X)∥ℓ∞ ≤ ∥f − f ∗ ∥L∞ and

∥f − f ∗ ∥L∞ ≤ ∥f − VM f ∥L∞ + ∥VM f − f ∗ ∥L∞

(3.17)
≤ (1 + e)E[−M,M ]d (f ) + e(σn (f ; T d ) + ε) ,

where we used (2.2) and the definition of f ∗ . This implies

∥f − Rm,λ f ; X)∥L∞ ≤ ∥f − f ∗ ∥L∞ + ∥f ∗ − Rm,λ f ; X)∥L∞

√ (3.18)
≤ C n σn (f ; T d )L∞ + E[−M,M ]d ∩Zd (f ; T d )L∞ .

10
It remains to verify the second estimate in (3.17) which is a standard computation. We decided
to provide the short proof for the convenience of the reader. Let g ∈ T [−M, M ]d denote an
arbitrary trigonometric polynomial. Clearly, VM g = g and therefore,
∥f − VM f ∥L∞ = ∥f − g + g − VM f ∥L∞ = ∥f − g − VM (f − g)∥L∞
(3.19)
≤ ∥f − g∥L∞ + ∥VM (f − g)∥L∞ ≤ (1 + e)∥f − g∥L∞ .

Taking the infimum over g ∈ T [−M, M ]d yields

∥f − VM f ∥L∞ ≤ (1 + e)E[−M,M ]d (f ; T d )L∞ . (3.20)

Finally, using f ∗ = Vm s gives
∥VM f − f ∗ ∥L∞ = ∥VM f − VM s∥L∞ ≤ e∥f − s∥L∞ = eσn (f ; T d )L∞ + eε . (3.21)
We now obtain the desired bound for q = ∞ in (3.18) by letting ε go to zero . The L2 -result
is proven completely analogous. We use Parseval in (3.16) to step from the L2 -norm to the
ℓ2 -norm of the coefficients. Using the corresponding estimate in Theorem 3.5 we end up with

∥f − Rm,λ f ; X)∥L2 ≤ C σn (f ; T d )L∞ + E[−M,M ]d ∩Zd (f ; T d )L∞ .

By a standard interpolation argument (Hölder’s inequality) we have with probability at least

1 − N −γ log(n+1) the bound

1−θ
∥f − Rm,λ f ; X)∥Lq ≤ ∥f − Rm,λ f ; X)∥L2
∥f − Rm,λ f ; X)∥θL∞
2/q 1−2/q
= ∥f − Rm,λ f ; X)∥L2 ∥f − Rm,λ f ; X)∥L∞ ,
where the interpolation parameter θ has to be chosen in such a way that 1/q = (1 − θ)/2 + θ/∞
which yields θ = 1 − 2/q. This concludes the proof.

3.2 Analysis of orthogonal matching pursuit (OMP)

Similar conditions (up to some constants) as in Theorem 3.5 lead to analogous bounds for
greedy methods like (OMP) instead of (rLasso). For more details and analysis see [16, Section
6.4] and regarding the implementation [24].
We may now use a known recovery result from [16, Theorem 6.25], tailor it to our situation
and end up with a recovery result analogous to Theorem 3.5 under familiar conditions on the
number of samples.
Proposition 3.8. There exist universal constants α, β, γ, δ > 0 such that the following holds
true. Let D ∈ N, N = (2D + 1)d and n, m ∈ N satisfying
m ≥ α · d · n · log2 (n + 1) · log(D + 1). (3.22)
√ iid
Put A = 1/ m(ej (xℓ ))1≤ℓ≤m,1≤j≤N for x1 , . . . , xm ∼ µG . Then it holds with probability at
least 1 − N −γ log(n+1) that
σn (c)ℓ1
∥c − c24n ∥ℓ2 ≤ β √ + δ · ∥Ac − y∥ℓ2 . (3.23)
n
In addition, we obtain a control on the ℓ1 -error
√
∥c − c24n ∥ℓ1 ≤ βσn (c)ℓ1 + δ n · ∥Ac − y∥ℓ2 , (3.24)
where ck is iteratively defined in Definition 3.2.

11
Proof. We combine [17, Theorem 3.7] and [16, Theorem 6.25]. Precisely, choosing α large
enough in (3.22) ensures that A has RIP of order 13n with RIP-constant δ13n < 1/6, see (3.14)
and [17, Theorem 3.7]. This is required in [16, Theorem 6.25] to guarantee the recovery bounds
(3.23), (3.24).

Theorem 3.9. Under similar conditions as in Theorem 3.7, we receive

∥f − Pm,24n (f ; X)∥Lq ≤ Cn1/2−1/q · σn (f ; T d )L∞ + E[−M,M ]d ∩Zd (f ; T d )L∞ ,

where Pm,k denotes the (OMP) decoder from Definition 3.2 after k iterations.

Proof. The proof is completely analogous to the proof of Theorem 3.7. This time we use
Proposition 3.8 instead of Theorem 3.5.

3.3 Sampling widths and further discussion

In this section we apply the above bounds for recovery methods to obtain upper bounds for
sampling widths with respect to function classes F on the d-torus.

Corollary 3.10. Let F ,→ C(Td ) denote a function class compactly embedded into the space
of continuous functions on the d-torus. Let further d, m, n, M ∈ N such that M ≥ d and

m := ⌈α · d · n · log2 (n + 1) · log(M + 1)⌉

for an appropriate universal constant α > 0. Then

ϱm (F)Lq ≤ Cn1/2−1/q · σn (F; T d )L∞ + E[−M,M ]d ∩Zd (F; T d )L∞ ,

where the quantity ϱm (F)Lq denotes the m-th sampling width and is defined in (2.1) . The
constants C, α are inherited from either Theorem 3.9 or Theorem 3.7.

Remark 3.11. Similar as in Krieg [20, Lemma 9] one may prove a version of the above result
which looks as follows. Under the condition (1.2) we have

∥f − Rm (f ; X)∥Lq ≤ C1 n−1/q σn (f ; T d )A + C2 n1/2−1/q E[−M,M ]d ∩Zd (f ; T d )L∞ , (3.25)

where Rm denotes both, the (rLasso) and the (OMP) decoders Rm,κ√n and Pm,24n , respectively.
Note, that this version differs from the one in [20, Lemma 9], since the author does not use the
L∞ -best approximation on the right-hand side. Let us emphasize that the bounds in Theorems
3.7 and 3.9 have the advantage that one can directly insert known bounds for L∞ widths without
relying on the embedding into the Wiener algebra A, see Temlyakov [36] and Temlyakov, T.
Ullrich [38], as well as Dũng, Temlyakov, T. Ullrich [12, Chapt. 4, 7]. This is relevant in
situations, when the function class is not embedded into A, as, for instance, the space Wpr with
p > 2 and 1/p < r ≤ 1/2. Indeed, this space is compactly embedded into C(Td ) but not in A,
which will be proved in a forthcoming paper by the authors.

Remark 3.12 (Linear recovery in L∞ ). Note, that taking into account the fact (see [32, Chapter
4.2.2], also [11]), that if the target space is Y = L∞ (as in our case), and one established an
estimate for the non-linear sampling numbers, then there exists a linear algorithm with the same
error bound. However, we only know the of the existence of such an algorithm, without any
deterministic construction.

12
4 Examples
We will now discuss examples where Theorems 3.7 and Theorem 3.9 improve existing results
in certain directions. We start in Subsection 4.1 with the mixed Wiener spaces Armix , a gener-
alization of the classical Wiener algebra A. These have been studied a lot due to their good
embedding properties and their connection to Barron classes. Recent work on these spaces
and their approximation properties by Jahn, T. Ullrich and Voigtlaender [18]; Kolomoitsev,
Lomako, Tikhonov [19]; Krieg [20]; Moeller [27]; Moeller, Stasyuk and T. Ullrich [28]; V.K.
Nguyen, V.N. Nguyen and Sickel [30] and others.

Definition 4.1. For r ≥ 0 we define the mixed Wiener space Armix of functions f ∈ L1 (Td )
with the finite norm
XY d
∥f ∥Armix := (1 + |ki |)r |fˆ(k)|,
k∈Zd i=1

where fˆ(k) are the respective Fourier coefficients. For the univariate case we use the notation
Ar , since the smoothness is not more mixed. In the case r = 0, we get the Wiener algebra, that
will be denoted in what follows by A.

In Subsection 4.2 we investigate how and in which cases the (rLasso) can beat linear
algorithms for spaces of functions with bounded mixed derivative defined in the following way.
Define for x ∈ T and r > 0 the univariate Bernoulli kernel
∞
X X
Fr (x) := 1 + 2 k −r cos(2πkx) = max{1, |k|}−r exp(2πikx)
k=1 k∈Z
Qd
and define the multivariate Bernoulli kernels as Fr (x) := j=1 Fr (xj ), x ∈ Td .

Definition 4.2. Let r > 0 and 1 < p < ∞. Then Wpr is defined as the normed space of all
elements f ∈ Lp (Td ) which can be written as
Z
f = Fr ∗ φ := Fr (· − y)φ(y) dy
Td

for some φ ∈ Lp (Td ), equipped with the norm ∥f ∥Wpr := ∥φ∥Lp (Td ) .

In order to prove the statements, we will use embeddings of Armix and Wpr into the Besov
spaces Brp,θ of functions with bounded mixed differences.

Definition 4.3. Let r ≥ 0, 1 ≤ θ ≤ ∞, 1 < p < ∞. Then the periodic Besov space Brp,θ with
mixed smoothness is defined as the normed space of all elements f ∈ Lp (Td ), endowed with the
norm (with the usual modifications if θ = ∞)
X X θ 1/θ
∥f ∥Brp,θ := 2|s|1 rθ fˆ(k) exp(2πik · x) , 1 ≤ θ < ∞,
p
s∈Nd0 k∈ρ(s)

where n o
ρ(s) := k ∈ Zd : ⌊2sj −1 ⌋ ≤ |kj | < 2sj , j = 1, . . . , d , s ∈ Nd0 . (4.1)

13
4.1 Recovery of functions belonging to mixed weighted Wiener spaces
Corollary 4.4. Let r > 1/2 and 2 ≤ q ≤ ∞. Let further d, n ∈ N and m > Cr,d n log3 (n + 1)
with an appropriate constant Cr,d > 0 then there is a non-linear recovery operator Am based on
(rLasso) or (OMP) using m random samples such that with high probability
sup ∥f − Am (f )∥Lq ≲ n−(r+1/q) (log(n + 1))(d−1)r+1/2 . (4.2)
∥f ∥Ar ≤1
mix

Proof. Using [18, Lemma 4.3] and choosing M := ⌊n(r+1/2)/r ⌋ we obtain (4.2) as a direct
consequence of Theorems 3.7, 3.9.
Remark 4.5. The upper bound in Corollary 4.4 is sharp in the main order, which even co-
incides with those for the Gelfand width. One can show this by using the good embedding
properties of Wiener spaces and an exact order estimates for the Gelfand widths of the Besov
spaces embeddings by Vybiral [39, Theorem 4.12]. Indeed,
r+1/2 r+1/2
ϱn (Armix (Td ))Lq ≥ ϱn (Ar (T))Lq ≥ ϱn (B2,1 )Lq ≥ ϱn (B2,1 )Bq,∞
0

r+1/2
(4.3)
≥ cn (B2,1 )Bq,∞
0 ≍ n−(r+1/q) .
In the first line we retreat to the one-dimensional setting.
Remark 4.6 (Nonlinearity helps for Armix ). If we compare this upper bound for non-linear
approximation to lower bounds for linear approximation we can show how much better non-
linear approximation is compared to linear approximation. Indeed [30, Theorem 4.7] states (in
our notation, putting r = 1, s = r) that, for r > 0
−r
ϱlin r d
n (Amix (T ))Lq ≥ n log(n)(d−1)r .
We have that the maximal possible difference in the rates is attained for q = 2 and the same
main rate for q = ∞ when comparing linear and non-linear approximation of mixed Wiener
spaces in Lq spaces, since the difference between rates is always 1/q.
The sharp upper bounds for a linear recovery from samples in a more general setting, in
particular for the worst-case errors of recovery of functions from the weighted Wiener spaces
by the Smolyak algorithm, were obtained in [19], see e.g., Theorem 5.1 and Remark 6.4. In
[21, Corollary 23] the upper bounds were proved for an algorithm that uses subsampled random
points that are sharp in the case q = 2, see also [21, Remark 24] for the comparison with the
Smolyak algorithm.

4.2 Results for functions with Lp -bounded mixed derivative

In order to have access to function values we use the restriction r > 1/p which implies that every
equivalence class f ∈ Wpr contains a continuous periodic function, see [12, Lemma 3.4.1(iii)
and 3.4.3]. Moreover, the embedding Wpr ,→ C(Td ) is then compact. The results below
are partly mentioned in [18, Section 4.2]. Here we extend these results and give some further
detail. The overview of our findings concerning the optimality of different non-linear algorithms
is presented on Figure 2. For a detailed comparison of linear recovery algorithms we refer to
[22, Figure 1].
We will use the sampling widths as introduced in (2.1) above to compare the potential of
non-linear sampling recovery methods to other benchmark quantities. As a first application of
Theorems (3.7), 3.9 for functions with bounded mixed derivative from the Sobolev classes Wpr ,
r > 1/p, 1 < p < 2 we obtain the result below. We argue similarly as in Section 4.2 of [18] (the
class Wpr is the same as Spr W (Td ) in their notation).

14
Corollary 4.7 (Lower right region Wpr ). Let 1 < p ≤ 2 ≤ q ≤ ∞ and r > 1/p. Let further
d, n ∈ N. Then there is a constant Cr,p,d > 0 such that for
m > Cr,p,d n log3 (n + 1)
there is a non-linear recovery operator Am based on (rLasso) or (OMP) using m random samples
such that with high probability the following asymptotic bound holds
−(r− p1 + 1q ) (d−1)(r−2( p1 − 12 ))+ 12
sup ∥f − Am (f )∥Lq ≲ n (log(n + 1)) . (4.4)
∥f ∥Wpr ≤1

Proof. From Theorems 3.7, 3.9 and the arguments from the proof of Corollary 4.14 in [18] (the
class Wpr is the same as Spr W (Td ) in their notation) we choose M a dyadic number satisfying
−1 −1
n2r(r−1/p) ≤ M ≤ 2n2r(r−1/p) . The corresponding (rLasso) or (OMP) decoder associated
to this M , which uses m random samples, guarantees
sup ∥f − Am (f )∥Lq ≲ n1/2−1/q · σn (Wpr ; T d )L∞ + E[−M,M d ]∩Zd (f ; T d )L∞ ),
∥f ∥Wpr ≤1 ≤

where we used the upper bound for the best n-term trigonometric approximation from [36,
Thm. 2.9] to balanced both terms by the choice of M . This yields (4.4).

Remark 4.8 (Main rate sharp in Corollary 4.7). One can show the sharpness of the main
rate of convergence in Corollary 4.7 using the fooling argument from [31, Theorem 23] (for
d = 1). Actually, the main rate n−(r−(1/p−1/q)) is optimal for both linear and nonlinear sampling
recovery.
Note that in the region 1 < p < 2 < q < ∞, the recovery from arbitrary linear information
of functions from the class Wpr in Lq always outperform (also non-linear) sampling recovery in
the main rate. i.e., λn (Wpr )Lq = o(ϱn (Wpr )Lq ).
Interestingly in the case 1/p + 1/q > 1 the Gelfand widths cn (Wpr )Lq decay faster in the
main rate than the respective linear widths λn (Wpr )Lq . For 1/p + 1/q ≤ 1 it holds cn (Wpr )Lq ≍
λn (Wpr )Lq .
Remark 4.9. Let us compare the bound for (rLasso) and (OMP)and from Corollary 4.7 with
those for other recovery methods. Here we assume that 1 < p < 2 < q < ∞, the case q = 2 will
be discussed separately in Remark 4.10 below.
(i) [Comparison to (Smolyak)] In the paper [9, Cor. 7.1] an upper bound for the linear
sampling numbers of Wpr (the same as Sp,2r F (Td ) with µ = d in their notation) in L has been
q
given for the worst-case recovery using the linear Smolyak algorithm Sn,d , which for r > 1/p,
1 < p < q < ∞ yields that
−(r− p1 + 1q ) (d−1)(r− p1 + 1q )
sup ∥f − Sn,d (f )∥Lq ≲ n (log n) . (4.5)
∥f ∥Wpr ≤1

By the embedding Brp,p ,→ Wpr in case 1 < p < 2 < q < ∞ together with [13, Thm. 5.1,(ii)] we
know that we can not do better in Lq as in (4.5) if we restrict to sparse grid (Smolyak) points.
Hence, our non-linear approach outperforms sparse grids if d is large and
2(1/p − 1/2) > 1/p − 1/q ⇐⇒ 1/p + 1/q > 1 .
(ii) [Comparison to (Lsqr)] In [22, Cor. 21] we obtain (4.5) for 1 < p < 2 < q < ∞ also with
a different linear method, namely plain least squares estimator based on subsampled random
points involving the solution of the Kadison-Singer problem [26]. We do not know if the bound
given there is sharp and whether it may outperform (rLasso) or (OMP).

15
Remark 4.10 (L2 -estimates outperform any linear method). From Corollary 4.7 we obtain
the following important special case for q = 2, see also [18, Corollary 4.16],
−r+ p1 − 21 (d−1)(r− p1 + 12 ) 1
−(d−1)( p1 − 12 )
sup ∥f − Am (f )∥L2 ≲ n (log(n + 1)) (log(n + 1)) 2 .
∥f ∥Wpr ≤1

As mentioned in [18, Remark 4.17], for sufficiently large d the non-linear sampling numbers
decay faster in this situation than the respective linear widths, which coincide in the order of
decay with the linear sampling numbers.
Let us proceed with the case p > 2.
Corollary 4.11 (Left region including small smoothness). Let 2 ≤ p < ∞, 1 ≤ q < ∞. Then
there is a constant Cr,p,d > 0 such that with m = ⌈Cr,p,d n log3 (n + 1)⌉

−(r−( 12 − 1q )+ )


 n (log(n + 1))(d−1)(1−r)+r , 1/p < r < 1/2,
1 1
r −(r−( 2 − q )+ ) (d−1)(1−r)+r r+1
ϱm (Wp )Lq ≲ n (log(n + 1)) (log log n) , r = 1/2, (4.6)
n−(r−( 21 − 1q )+ ) (log n)(d−1)r+ 21 ,


r > 1/2.
Proof. Since ∥·∥Lq ≤ ∥·∥L2 for q ≤ 2, it suffices to consider the case 2 ≤ q < ∞. Further, in order
to employ Theorems 3.7, 3.9, we need upper estimates for the quantities σn (Wpr ; T d )L∞ and
E[−M,M ]d ∩Zd (Wpr ; T d )L∞ . The rate of convergence of the respective best n-term approximation
width for 2 ≤ p < ∞ is

−r (d−1)(1−r)+r , 1/p < r < 1/2,
n (log n)

σn (Wpr ; T d )L∞ ≲ n−r (log n)(d−1)(1−r)+r (log log n)r+1 , r = 1/2, (4.7)

 −r
n (log n) (d−1)r+1/2 , r > 1/2.
The case of small smoothness is known from [38, Theorems 6.1, 6.2], the big smoothness case
is taken from [36, Theorem 1.3], see also [12, Theorem 7.5.2].
In what follows we show that for an appropriately chosen M = M (n, r, p), the quantity
E[−M,M ]d ∩Zd (Wpr ; T d )L∞ decays faster than the respective best n-term approximation, see
Lemma 4.12 below.
Hence, Theorems 3.7, 3.9 yield the estimate
ϱ⌈Cr,p,d n log3 (n+1)⌉ (Wpr )Lq ≤ 2n1/2−1/q · σn (Wpr ; T d )L∞ .

To conclude the proof, we use (4.7).

Lemma 4.12. Let M ∈ N, 2 ≤ p < ∞ and r > 1/p. Then it holds

−(r− p1 )
E[−M,M ]d ∩Zd (Wpr ; T d )L∞ ≲ M .
−1 −1
In addition, for M such that n2r(r−1/p) ≤ M ≤ 2n2r(r−1/p) it holds
E[−M,M ]d ∩Zd (Wpr ; T d )L∞ ≲ n−r ≲ σn (Wpr ; T d )L∞ . (4.8)
Proof. By the embedding Wpr ,→ Brp,p , p ≥ 2, and the Nikol’skii inequality, we get

E[−M,M ]d ∩Zd (Wpr ; T d )L∞ ≤ sup inf ∥fˆ(k) exp(2πik · x)∥L∞

∥f ∥Brp,p ≤1 k∈Zd \[−M,M ]d
X |s|1 X
≤ sup 2 p fˆ(k) exp(2πik · x) , (4.9)
∥f ∥Brp,p ≤1 sj
p
s∈Nd0 ,∃sj : 2 >M k∈ρ(s)

16
where the blocks ρ(s) are defined in (4.1).
In what follows we use the Hölder’s inequality and obtain
−|s| (r− 1 )
X X
sup 2 1 p fˆ(k) exp(2πik · x) 2r|s|1
∥f ∥Brp,p ≤1 p
s∈Nd0 ,∃sj : 2sj >M k∈ρ(s)
 1− 1
p
−(r− p1 ) −|s|1 (r− p1 )(1− p1 ) −(r− p1 )
X
≤M  2  sup ∥f ∥Brp,p ≲ M .
∥f ∥Brp,p ≤1
s∈Nd0
−1 −1
Choosing the parameter M such that n2r(r−1/p) ≤ M ≤ 2n2r(r−1/p) implies (4.8).

Remark 4.13 (Left upper region – almost sharp). (i) For 1 < q < 2 < p < ∞, the order of
Gelfand widths is cn (Wpr )Lq ≍ λn (Wpr )Lq ≍ n−r (log n)(d−1)r (see e.g. [12, Section 9.6]). With
(rLasso) and (OMP) we obtain the same main rate but additional (d-independent) logarithms,
i.e. it is almost optimal w.r.t. Gelfand widths.
(ii) (Comparison to (Lsqr) and (Smolyak)) The sharp (w.r.t. Gelfand numbers) bound for
(Lsqr) in the case 1 < q < 2 < p < ∞ was obtained in [22, Cor. 21]. Note that the approach
in [22] required a square summability of linear width, and cover only the case r > 1/2, whereas
in [21] and [37] this condition can be avoided by paying a d-independent logarithm.
(iii) In this region 1 < q < 2 < p < ∞, the right order for (Smolyak) behaves as
n−r (log n)(d−1)(r+1/2) (see [12, Thm. 5.3.1] and references therein). In fact, by the embed-
ding Brp,2 ,→ Wpr together with [13, Thm. 5.1,(ii)] we know that we can not do better in Lq if
we restrict to sparse grid points. This estimate is worse in logarithms to the power d than those
for (rLasso) and (OMP) for large d.
Remark 4.14 (Left lower region – Smolyak is optimal). We will further distinguish two cases:
2 < p < q < ∞ (lower triangular) and 2 < q < p < ∞ (upper triangular).
(i) (Lower triangular) In this region we know the exact (w.r.t. Gelfand linear) order (4.5)
for (Smolyak) from the paper [9, Cor. 7.1], that is better in the main rate than the bound for
(rLasso) and (OMP) (which is in turn better than (Lsqr) from [22, Cor. 21]). Note, that for
p = 2 < q ≤ ∞ (Lsqr) gives the same (sharp) order of decay as (Smolyak).
(ii) (Upper triangular) For 2 < q < p < ∞ we do not know anything about the optimality of
(linear and non-linear) sampling algorithms. (rLasso) gives worse in the main rate estimated
than Gelfand widths, in turn the existing upper bounds for (Smolyak) and (Lsqr) are worse
than those for the linear widths. Note that in this region Gelfand numbers decay faster than
linear widths in the main rate.
Remark 4.15 (Right upper region – Smolyak is optimal among linear methods). The region
1 < p, q < 2 consists of two triangular areas: 1 < p < q < 2 (lower triangular) and 1 < q ≤
p < 2 (upper triangular). For 1 < p < q < 2, the bound for (Smolyak) [9, Cor. 7.1] coincides
with those for the linear widths. In the case 1 < q ≤ p < 2 we cannot say anything about the
optimality w.r.t. neither linear nor Gelfand widths.
Acknowledgement. The first named author MM is supported by the ESF, being co-financed
by the European Union and from tax revenues on the basis of the budget adopted by the
Saxonian State Parliament. KP would like to acknowledge support by the Philipp Schwartz
Fellowship of the Alexander von Humboldt Foundation and the German Research Foundations
(DFG 403/4-1). KP and TU would like to thank Ben Adcock for pointing out reference [7]
and bringing square root Lasso as a noise blind alternative to the authors’ attention during a
discussion within the Session Function recovery and discretization problems organized by David
Krieg and KP at the conference MCQMC24 in Waterloo (CA).

17
References
[1] B. Adcock, A. Bao, and S. Brugiapaglia. Correcting for unknown errors in sparse high-
dimensional function approximation. Numer. Math., 142(3):667–711, 2019.
[2] F. Bartel, M. Schäfer, and T. Ullrich. Constructive subsampling of finite frames with
applications in optimal function recovery. Appl. Comput. Harmon. Anal., 65:209–248,
2023.
[3] É. S. Belinskii. Approximation of functions of several variables by trigonometric polyno-
mials with given number of harmonics, and estimates of ϵ-entropy. Analysis Mathematica,
15:67–74, 1989.
[4] A. Belloni, V. Chernozhukov, and L. Wang. Square-root lasso: pivotal recovery of sparse
signals via conic programming. Biometrika, 98(4):791–806, 2011.
[5] A. Berk, S. Brugiapaglia, and T. Hoheisel. Square root LASSO: Well-posedness, Lipschitz
stability and the tuning trade off. SIAM Journal on Optimization, 34(3):2609–2637, 2024.
[6] J. Bourgain. An improved estimate in the restricted isometry problem. In Geometric as-
pects of functional analysis, volume 2116 of Lecture Notes in Math., pages 65–70. Springer,
Cham, 2014.
[7] S. Brugiapaglia, S. Dirksen, H. C. Jung, and H. Rauhut. Sparse recovery in bounded Riesz
systems with applications to numerical methods for pdes. Applied and Computational
Harmonic Analysis, 53:231–269, 2021.
[8] H.-J. Bungartz and M. Griebel. Sparse grids. Acta Numerica, 13:147–269, 2004.
[9] G. Byrenheid and T. Ullrich. Optimal sampling recovery of mixed order Sobolev em-
beddings via discrete Littlewood–Paley type characterizations. Analysis Mathematica,
43:807–820, 2017.
[10] A. Cohen, W. Dahmen, and R. DeVore. Compressed sensing and best k-term approxima-
tion. American Mathematical Society, 22:211–231, 2009.
[11] J. Creutzig and P. Wojtaszczyk. Linear vs. nonlinear algorithms for linear problems.
Journal of Complexity, 20(6):807–820, 2004.
[12] D. Dũng, V. N. Temlyakov, and T. Ullrich. Hyperbolic cross approximation. Advanced
Courses in Mathematics. CRM Barcelona. Birkhäuser/Springer, Cham, 2018. Edited and
with a foreword by Sergey Tikhonov.
[13] D. Dũng and T. Ullrich. Lower bounds for the integration error for multivariate functions
with mixed smoothness and optimal Fibonacci cubature for functions on the square. Math.
Nachr., 288(7):743–762, 2015.
[14] F. Dai and V. N. Temlyakov. Random points are good for universal discretization. J.
Math. Anal. Appl., 529(1):Paper No. 127570, 28, 2024.
[15] M. Dolbeault, D. Krieg, and M. Ullrich. A sharp upper bound for sampling numbers in
L2 . Appl. Comput. Harmon. Anal., 63:113–134, 2023.
[16] S. Foucart and H. Rauhut. A mathematical introduction to compressive sensing. Applied
and Numerical Harmonic Analysis. Birkhäuser/Springer, New York, 2013.

18
[17] I. Haviv and O. Regev. The restricted isometry property of subsampled Fourier matrices.
In Geometric Aspects of Functional Analysis: Israel Seminar (GAFA) 2014–2016, pages
163–179. Springer, 2017.

[18] T. Jahn, T. Ullrich, and F. Voigtlaender. Sampling numbers of smoothness classes via
ℓ1 -minimization. Journal of Complexity, 79:Paper No. 101786, 35, 2023.

[19] Y. Kolomoitsev, T. Lomako, and S. Tikhonov. Sparse grid approximation in weighted

Wiener spaces. Journal of Fourier Analysis and Applications, 2:19–29, 2023.

[20] D. Krieg. Tractability of sampling recovery on unweighted function classes. Proc. Amer.
Math. Soc. Ser. B, 11:115–125, 2024.

[21] D. Krieg, K. Pozharska, M. Ullrich, and T. Ullrich. Sampling projections in the uniform
norm. arXiv:math/2401.02220, 2024.

[22] D. Krieg, K. Pozharska, M. Ullrich, and T. Ullrich. Sampling recovery in L2 and other
norms. arXiv:math/2305.07539, 2024.

[23] D. Krieg and M. Ullrich. Function values are enough for L2 -approximation. Found. Com-
put. Math., 21(4):1141–1151, 2021.

[24] S. Kunis and H. Rauhut. Random sampling of sparse trigonometric polynomials. II. Or-
thogonal matching pursuit versus basis pursuit. Found. Comput. Math., 8(6):737–763,
2008.

[25] I. Limonova and V. N. Temlyakov. On sampling discretization in L2 . J. Math. Anal. Appl.,

515(2):Paper No. 126457, 14, 2022.

[26] A. W. Marcus, D. A. Spielman, and N. Srivastava. Interlacing families II: Mixed charac-
teristic polynomials and the Kadison-Singer problem. Ann. of Math. (2), 182(1):327–350,
2015.

[27] M. Moeller. Gelfand numbers and best m-term trigonometric approximation for weighted
mixed Wiener classes in L2 . Master’s thesis, TU Chemnitz, Germany, 2023.

[28] M. Moeller, S. Stasyuk, and T. Ullrich. High-dimensional sparse trigonometric approxi-

mation in the uniform norm and consequences for sampling recovery. arXiv:2407.15965,
2024.

[29] N. Nagel, M. Schäfer, and T. Ullrich. A new upper bound for sampling numbers. Found.
Comput. Math., 22(2):445–468, 2022.

[30] V. D. Nguyen, V. K. Nguyen, and W. Sickel. s-numbers of embeddings of weighted Wiener

algebras. J. Approx. Theory, 279:26, 2022. Id/No 105745.

[31] E. Novak and H. Triebel. Function spaces in Lipschitz domains and optimal rates of
convergence for sampling. Constructive Approximation, 23:325–350, 2005.

[32] E. Novak and H. Woźniakowski. Tractability of multivariate problems. Vol. 1: Linear

information, volume 6 of EMS Tracts in Mathematics. European Mathematical Society
(EMS), Zürich, 2008.

[33] H. B. Petersen and P. Jung. Robust instance-optimal recovery of sparse signals at unknown
noise levels. Inf. Inference, 11(3):845–887, 2022.

19
[34] S. A. Smolyak. Quadrature and interpolation formulas for tensor products of certain classes
of functions. Dokl. Akad. Nauk SSSR, 148:1042–1045, 1963.

[35] V. N. Temlyakov. Approximation of Periodic Functions. Computational mathematics and

analysis series. Nova Science Publishers, 1993.

[36] V. N. Temlyakov. Constructive sparse trigonometric approximation and other problems

for functions with mixed smoothness. Sbornik: Mathematics, 206(11):16–28, Nov. 2015.

[37] V. N. Temlyakov and T. Ullrich. Bounds on Kolmogorov widths and sampling recovery
for classes with small mixed smoothness. Journal of Complexity, 67:101575, 2021.

[38] V. N. Temlyakov and T. Ullrich. Approximation of functions with small mixed smoothness
in the uniform norm. Journal of Approximation Theory, 277:105718, 2022.

[39] J. Vybíral. Widths of embeddings in function spaces. Journal of Complexity, 24(4):545–

570, 2008.

[40] T. Zhang. Sparse recovery with orthogonal matching pursuit under RIP. IEEE Trans.
Inform. Theory, 57(9):6215–6221, 2011.

[41] A. Zygmund. Trigonometric series. Vol. I, II. Cambridge Mathematical Library. Cam-
bridge University Press, Cambridge, 2002.

Recovery Guarantees For Distributed-OMP
No ratings yet
Recovery Guarantees For Distributed-OMP
47 pages
Hybrid Least Squares For Learning Functions From Highly Noisy Data
No ratings yet
Hybrid Least Squares For Learning Functions From Highly Noisy Data
30 pages
2000 - Local Strong Homogeneity of A Regularized Estimator
No ratings yet
2000 - Local Strong Homogeneity of A Regularized Estimator
27 pages
Exact Recovery in The Double Sparse Model: Sufficient and Necessary Signal Conditions
No ratings yet
Exact Recovery in The Double Sparse Model: Sufficient and Necessary Signal Conditions
50 pages
Fast Exact Recovery of Noisy Matrix From Few Entries: The Infinity Norm Approach
No ratings yet
Fast Exact Recovery of Noisy Matrix From Few Entries: The Infinity Norm Approach
56 pages
Lower Complexity Bounds of Finite-Sum Optimization Problems: The Results and Construction
No ratings yet
Lower Complexity Bounds of Finite-Sum Optimization Problems: The Results and Construction
86 pages
2022lectures1-8 Optimization For DataScience
No ratings yet
2022lectures1-8 Optimization For DataScience
35 pages
Philosophy of Science - by Fulton J. Sheen Preface by Leon Noël.
No ratings yet
Philosophy of Science - by Fulton J. Sheen Preface by Leon Noël.
233 pages
Justin Romberg
No ratings yet
Justin Romberg
69 pages
Surrogates
No ratings yet
Surrogates
35 pages
Tối Ưu Hóa Cho Khoa Học Dữ Liệu
No ratings yet
Tối Ưu Hóa Cho Khoa Học Dữ Liệu
64 pages
Sketching As A Tool For Numerical Linear Algebra
No ratings yet
Sketching As A Tool For Numerical Linear Algebra
139 pages
Mathematics 09 00329
No ratings yet
Mathematics 09 00329
19 pages
Adaptive Signal Recovery in Sparse Nonparametric Models: Natalia Stepanova and Marie Turcicova
No ratings yet
Adaptive Signal Recovery in Sparse Nonparametric Models: Natalia Stepanova and Marie Turcicova
15 pages
Contrastive Linguistic - Morphology
No ratings yet
Contrastive Linguistic - Morphology
22 pages
Model Selection For High-Dimensional Linear Regression
No ratings yet
Model Selection For High-Dimensional Linear Regression
22 pages
Compressive Phase Retrieval From Squared Output Measurements Via Semidefinite Programming
No ratings yet
Compressive Phase Retrieval From Squared Output Measurements Via Semidefinite Programming
27 pages
AMP Paper
No ratings yet
AMP Paper
16 pages
Gradient Descent With Sparsification: An Iterative Algorithm For Sparse Recovery With Restricted Isometry Property
No ratings yet
Gradient Descent With Sparsification: An Iterative Algorithm For Sparse Recovery With Restricted Isometry Property
8 pages
2013-Compressed Sensing and Matrix Completion With Constant Proportion of Corruptions
No ratings yet
2013-Compressed Sensing and Matrix Completion With Constant Proportion of Corruptions
27 pages
Greedy Signal Space Recovery Algorithm W
No ratings yet
Greedy Signal Space Recovery Algorithm W
13 pages
Bayesian Flow Model
No ratings yet
Bayesian Flow Model
51 pages
Robust Uncertainty Principles Exact Signal Reconstruction From Highly Incomplete Frequency Informati-Lpc
No ratings yet
Robust Uncertainty Principles Exact Signal Reconstruction From Highly Incomplete Frequency Informati-Lpc
21 pages
Entity Extraction System
No ratings yet
Entity Extraction System
6 pages
Bank Reconciliation - Manual
100% (1)
Bank Reconciliation - Manual
9 pages
Barron Spaces
No ratings yet
Barron Spaces
32 pages
Orthogonal Matching Pursuit Recursive Function Approximation With Applications To Wavelet Decomposition
No ratings yet
Orthogonal Matching Pursuit Recursive Function Approximation With Applications To Wavelet Decomposition
5 pages
Guided Diffusion Sampling
No ratings yet
Guided Diffusion Sampling
30 pages
Sensing Matrix Optimization For Block-Sparse Decoding
No ratings yet
Sensing Matrix Optimization For Block-Sparse Decoding
13 pages
Compressed Sensing
No ratings yet
Compressed Sensing
18 pages
Abolghasemi 2012
No ratings yet
Abolghasemi 2012
11 pages
Exact Recovery
No ratings yet
Exact Recovery
41 pages
PurpComm Unit 1 Lesson 3
No ratings yet
PurpComm Unit 1 Lesson 3
34 pages
10.1007@s00498 020 00253 Z PDF
No ratings yet
10.1007@s00498 020 00253 Z PDF
23 pages
Analysis of The Ratio of ' and ' Norms in Compressed Sensing
No ratings yet
Analysis of The Ratio of ' and ' Norms in Compressed Sensing
24 pages
Approximation of Functionals On Korobov Spaces
No ratings yet
Approximation of Functionals On Korobov Spaces
19 pages
ARTS
No ratings yet
ARTS
50 pages
Chapter Two
No ratings yet
Chapter Two
13 pages
Enhancing Sparsity by Reweighted Minimization: Michael B. Wakin
No ratings yet
Enhancing Sparsity by Reweighted Minimization: Michael B. Wakin
29 pages
Widt Unit-Ii
No ratings yet
Widt Unit-Ii
47 pages
Nonlinear Transformations of Random Processes
From Everand
Nonlinear Transformations of Random Processes
Ralph Deutsch
No ratings yet
Candès and Romberg - 2007 - Sparsity and Incoherence in Compressive Sampling
No ratings yet
Candès and Romberg - 2007 - Sparsity and Incoherence in Compressive Sampling
18 pages
Essentials of Psychology Concepts and Applications 4th Edition by Nevid Jeffrey S Wei Zhi Download
No ratings yet
Essentials of Psychology Concepts and Applications 4th Edition by Nevid Jeffrey S Wei Zhi Download
26 pages
Karakus 2020
No ratings yet
Karakus 2020
12 pages
Enhancing Sparsity by Reweighted Minimization: Michael B. Wakin
No ratings yet
Enhancing Sparsity by Reweighted Minimization: Michael B. Wakin
29 pages
Robust Uncertainty Principles: Exact Signal Reconstruction From Highly Incomplete Frequency Information
No ratings yet
Robust Uncertainty Principles: Exact Signal Reconstruction From Highly Incomplete Frequency Information
39 pages
Penalty Decomposition Methods For L - Norm Minimization
No ratings yet
Penalty Decomposition Methods For L - Norm Minimization
26 pages
Applied and Computational Harmonic Analysis: D. Needell, J.A. Tropp
No ratings yet
Applied and Computational Harmonic Analysis: D. Needell, J.A. Tropp
21 pages
Sociolinguistics: Is Branch of Linguistics Which Deals With All Aspects of The Relationship Between Language and Society
No ratings yet
Sociolinguistics: Is Branch of Linguistics Which Deals With All Aspects of The Relationship Between Language and Society
25 pages
On Theory of Compressive Sensing Via ' - Minimization: Simple Derivations and Extensions
No ratings yet
On Theory of Compressive Sensing Via ' - Minimization: Simple Derivations and Extensions
25 pages
Materials 12 01227 PDF
No ratings yet
Materials 12 01227 PDF
10 pages
Pma 152lc Initial Test Experience (17may2023)
No ratings yet
Pma 152lc Initial Test Experience (17may2023)
5 pages
Ratio and Difference of and Norms and Sparse Representation With Coherent Dictionaries
No ratings yet
Ratio and Difference of and Norms and Sparse Representation With Coherent Dictionaries
14 pages
03 Sparse Approx Algs PDF
No ratings yet
03 Sparse Approx Algs PDF
12 pages
Unit 2 Oral Quiz: Conversation Strategy Conversation Strategy
100% (1)
Unit 2 Oral Quiz: Conversation Strategy Conversation Strategy
1 page
Post Card From Kashmir
0% (1)
Post Card From Kashmir
3 pages
Sparse Regression and Dictionary Learning
No ratings yet
Sparse Regression and Dictionary Learning
14 pages
Subspace Pursuit For Compressive Sensing Signal Reconstruction
No ratings yet
Subspace Pursuit For Compressive Sensing Signal Reconstruction
19 pages
Stable Recovery
No ratings yet
Stable Recovery
15 pages
Research Article: Tree-Based Backtracking Orthogonal Matching Pursuit For Sparse Signal Reconstruction
No ratings yet
Research Article: Tree-Based Backtracking Orthogonal Matching Pursuit For Sparse Signal Reconstruction
9 pages
DevGuru ASP Quickref
No ratings yet
DevGuru ASP Quickref
85 pages
SQL Injection Cheat Sheet
No ratings yet
SQL Injection Cheat Sheet
18 pages
O4MD 01 Introduction
No ratings yet
O4MD 01 Introduction
10 pages
2021 11 Exam
No ratings yet
2021 11 Exam
3 pages
CSIS 3300 W11 QueryOptimization
No ratings yet
CSIS 3300 W11 QueryOptimization
27 pages
Figueiredo Et Al. - 2007 - Gradient Projection For Sparse Reconstruction Application To Compressed Sensing and Other Inverse Problems-Annotated
No ratings yet
Figueiredo Et Al. - 2007 - Gradient Projection For Sparse Reconstruction Application To Compressed Sensing and Other Inverse Problems-Annotated
12 pages
WORKTEXT
No ratings yet
WORKTEXT
6 pages
Applied and Computational Harmonic Analysis: Emmanuel J. Candès, Mark A. Davenport
No ratings yet
Applied and Computational Harmonic Analysis: Emmanuel J. Candès, Mark A. Davenport
7 pages
Support Recovery Survey
No ratings yet
Support Recovery Survey
10 pages
Liu Et Al. - 2020 - High Dimensional Robust Sparse Regression
No ratings yet
Liu Et Al. - 2020 - High Dimensional Robust Sparse Regression
10 pages
Tape Rec - Pas
No ratings yet
Tape Rec - Pas
4 pages
Complete Investigator
100% (1)
Complete Investigator
26 pages
Compressed Sensing Reconstruction Comp Ecg Bio-Signal Allstot
No ratings yet
Compressed Sensing Reconstruction Comp Ecg Bio-Signal Allstot
4 pages
Sparse Detection With Integer Constraint Using Multipath Matching Pursuit
No ratings yet
Sparse Detection With Integer Constraint Using Multipath Matching Pursuit
4 pages
Adaptive Recovery of Spars Vector
No ratings yet
Adaptive Recovery of Spars Vector
9 pages
A Generic Proximal Algorithm For Convex Optimization - Application To Total Variation Minimization
No ratings yet
A Generic Proximal Algorithm For Convex Optimization - Application To Total Variation Minimization
5 pages
Recovery of Exact Sparse Representations in The Presence of Bounded Noise
No ratings yet
Recovery of Exact Sparse Representations in The Presence of Bounded Noise
8 pages
Kai Labusch, Erhardt Barth and Thomas Martinetz - Approaching The Time Dependent Cocktail Party Problem With Online Sparse Coding Neural Gas
No ratings yet
Kai Labusch, Erhardt Barth and Thomas Martinetz - Approaching The Time Dependent Cocktail Party Problem With Online Sparse Coding Neural Gas
9 pages
n · maxk,j - Uk,j - - The smaller μ, the fewer samples needed
No ratings yet
n · maxk,j - Uk,j - - The smaller μ, the fewer samples needed
9 pages
Of Of: Random Sampling Random Processes: Mean-Square Comparison Various Interpolators
No ratings yet
Of Of: Random Sampling Random Processes: Mean-Square Comparison Various Interpolators
8 pages
Practice Q 02
No ratings yet
Practice Q 02
2 pages
Article Writing
No ratings yet
Article Writing
8 pages
Steel-Grating-Catalogue
No ratings yet
Steel-Grating-Catalogue
23 pages
SAP JAVA GUI Installation: For Windows Instead
No ratings yet
SAP JAVA GUI Installation: For Windows Instead
6 pages
The Art of Music Production - The Theory and Practice (PDFDrive) - 105-119
No ratings yet
The Art of Music Production - The Theory and Practice (PDFDrive) - 105-119
15 pages
Compressed Sensing Simple Explanation
No ratings yet
Compressed Sensing Simple Explanation
5 pages
Example, Showing Entries in Different Databases: Relocatable
No ratings yet
Example, Showing Entries in Different Databases: Relocatable
15 pages
A Biographical Timeline
No ratings yet
A Biographical Timeline
7 pages
The Banking Concept of Education
No ratings yet
The Banking Concept of Education
6 pages
Passive Voice (Italian)
No ratings yet
Passive Voice (Italian)
3 pages
Q & A Exam (Adv V11) Q & A Exam (Adv V11) : Review Your Answers
No ratings yet
Q & A Exam (Adv V11) Q & A Exam (Adv V11) : Review Your Answers
7 pages
Https:/www-Jstor-Org Lib-E2 Lib Ttu Edu/stable/pdf/3050509 Pdf?refreqid
No ratings yet
Https:/www-Jstor-Org Lib-E2 Lib Ttu Edu/stable/pdf/3050509 Pdf?refreqid
5 pages
Jamia Rahmania & Jannatia Mohila Madrasha Jamia Rahmania & Jannatia Mohila Madrasha
No ratings yet
Jamia Rahmania & Jannatia Mohila Madrasha Jamia Rahmania & Jannatia Mohila Madrasha
2 pages

Instance Optimal Function Recovery

Uploaded by

Instance Optimal Function Recovery

Uploaded by

Instance optimal function recovery – samples, decoders and

March 21, 2025

m ≥ α · d · n · log2 (n + 1) · log M (1.2)

2 Best n-term and linear approximation

Dim = 256 m = 256 m = 4225 m = 384 m = 384

ϱm (F)Y := inf inf sup ∥f − R (f (X)) ∥Y , (2.1)

where R : Cm → Y denotes an arbitrary (not necessarily linear) reconstruction map. This

cm (F)Y = inf sup ∥f − R ◦ L(f )∥Y .

λm (F)Y := inf sup f − Tf Y

Furthermore, given J ⊂ I we denote the linear span of (bj (x))j∈J by

VJ := spanC {bk (·) : k ∈ J}.

σn (f ; B)Y := inf ∥f − g∥Y

the best n-term approximation error for f and by

σn (F; B)Y := sup σn (f ; B)Y

EJ (f ; B)Y := inf ∥f − g∥Y

EJ (F; B)Y := sup EJ (f ; B)Y .

2.1 Trigonometric systems, Fourier matrices and de la Valleé Poussin means

min ∥z∥ℓ1 (|J|) + λ∥Az − y∥ℓ2 (m) (3.3)

where with S 0 := ∅, c0 := 0 ∈ C|J| and for l = 1, ..., k repeat

S l+1 = S l ∪ argmax |(A∗ (y − Acl ))n | : n ∈ {1, . . . , |J|} ,

3.1 Analysis of square root Lasso (rLasso)

∥vS ∥ℓq ≤ ϱn1/q−1 ∥vS c ∥1 + τ ∥A · v∥ .

min ∥z∥ℓ1 (N ) + λ∥Az − y∥ℓ2 (m)

min ∥z∥ℓ1 (N ) + λ∥Az − y∥

m ≥ α · d · n · log2 (n + 1) · log(D + 1). (3.10)

where the constants ρ ∈ (0, 1), τ > 0 depend only on δ2n .

We can rearrange this to

We set ρ and τ accordingly and get the assertion.

m := ⌈α · d · n · log2 (n + 1) · log(D + 1)⌉

Note, that ∥f (X) − f ∗ (X)∥ℓ∞ ≤ ∥f − f ∗ ∥L∞ and

∥f − f ∗ ∥L∞ ≤ ∥f − VM f ∥L∞ + ∥VM f − f ∗ ∥L∞

where we used (2.2) and the definition of f ∗ . This implies

∥f − Rm,λ f ; X)∥L∞ ≤ ∥f − f ∗ ∥L∞ + ∥f ∗ − Rm,λ f ; X)∥L∞

Taking the infimum over g ∈ T [−M, M ]d yields

∥f − VM f ∥L∞ ≤ (1 + e)E[−M,M ]d (f ; T d )L∞ . (3.20)

By a standard interpolation argument (Hölder’s inequality) we have with probability at least

3.2 Analysis of orthogonal matching pursuit (OMP)

Theorem 3.9. Under similar conditions as in Theorem 3.7, we receive

3.3 Sampling widths and further discussion

m := ⌈α · d · n · log2 (n + 1) · log(M + 1)⌉

for an appropriate universal constant α > 0. Then

∥f − Rm (f ; X)∥Lq ≤ C1 n−1/q σn (f ; T d )A + C2 n1/2−1/q E[−M,M ]d ∩Zd (f ; T d )L∞ , (3.25)

4.2 Results for functions with Lp -bounded mixed derivative

To conclude the proof, we use (4.7).

Lemma 4.12. Let M ∈ N, 2 ≤ p < ∞ and r > 1/p. Then it holds

E[−M,M ]d ∩Zd (Wpr ; T d )L∞ ≤ sup inf ∥fˆ(k) exp(2πik · x)∥L∞

[19] Y. Kolomoitsev, T. Lomako, and S. Tikhonov. Sparse grid approximation in weighted

[25] I. Limonova and V. N. Temlyakov. On sampling discretization in L2 . J. Math. Anal. Appl.,

[28] M. Moeller, S. Stasyuk, and T. Ullrich. High-dimensional sparse trigonometric approxi-

[30] V. D. Nguyen, V. K. Nguyen, and W. Sickel. s-numbers of embeddings of weighted Wiener

[32] E. Novak and H. Woźniakowski. Tractability of multivariate problems. Vol. 1: Linear

[35] V. N. Temlyakov. Approximation of Periodic Functions. Computational mathematics and

[36] V. N. Temlyakov. Constructive sparse trigonometric approximation and other problems

[39] J. Vybíral. Widths of embeddings in function spaces. Journal of Complexity, 24(4):545–

You might also like