Instance Optimal Function Recovery
Instance Optimal Function Recovery
asymptotic performance
Moritz Moeller a , Kateryna Pozharska a,b , Tino Ullrich a,∗
a
Chemnitz University of Technology, Faculty of Mathematics
arXiv:2503.16209v1 [math.NA] 20 Mar 2025
b
Institute of Mathematics of NAS of Ukraine
Abstract
In this paper we study non-linear sampling recovery of multivariate functions using
techniques from compressed sensing. In the first part of the paper we prove that square
root Lasso (rLasso) with a particular choice of the regularization parameter λ > 0 as well
as orthogonal matching pursuit (OMP) after sufficiently many iterations provide noise blind
decoders which efficiently recover multivariate functions from random samples. In contrast
to basis pursuit the decoders (rLasso) and (OMP) do not require any additional information
on the width of the function class in L∞ and lead to instance optimal recovery guarantees.
In the second part of the paper we relate the findings to linear recovery methods such as least
squares (Lsqr) or Smolyak’s algorithm (Smolyak) and compare the performance in a model
situation, namely periodic multivariate functions with Lp -bounded mixed derivative will be
approximated in Lq . The main observation is the fact, that (rLasso) and (OMP) outperform
Smolyak’s algorithm (sparse grids) in various situations, where 1 < p < 2 ≤ q < ∞. For
q = 2 they even outperform any linear method including (Lsqr) in combination with
recently proposed subsampled random points.
1 Introduction
This paper can be seen as a continuation of Jahn, T. Ullrich, Voigtlaender [18]. Here we
aim for a certain type of instance optimality when recovering a multivariate function f from
samples. The term instance optimality was coinded by Cohen, Dahmen, DeVore [10]. Here
we use it in the context of function recovery from samples (decoding) and refer to an error
guarantee of type (1.3) and (1.4) which holds true for any instance f . A particular focus is put
on non-linear recovery methods (decoders) such as square root Lasso (rLasso), see Definition
3.1, and orthogonal matching pursuit (OMP), see Definition 3.2. The variants discussed here
use function samples at random points and provide recovery guarantees with high probability.
Square root Lasso (Least Absolute Shrinkage and Selection Operator) has been introduced by
Belloni, Chernozhukov and Wang [4], analyzed by H. Petersen, P. Jung [33] and already used for
function recovery in high dimensions by Adcock, Bao, Brugiapaglia [1]. The decoder (rLasso)
Keywords and phrases: multivariate approximation; best m-term approximation; uniform norm; rate of
convergence; sampling recovery.
2020 Mathematics subject classification: 42A10, 94A20, 41A46, 46E15, 42B35, 41A25, 41A17, 41A63
∗
Corresponding author, Email: [email protected]
1
turns out to be noise blind and does not require any further information of the functions class
where the function f belongs to. This is in contrast to the recently proposed variant of basis
pursuit denoising investigated by the third named author together with Jahn and Voigtlaender
[18], where we used certain widths in L∞ as a parameter for the ℓ1 -minimization decoder.
We propose to use the recovery operator Rm,λ (·; X) based on the optimization program
(rLasso)
min ∥z∥ℓ1 (N ) + λ∥Az − y∥ℓ2 (m) , (1.1)
z∈CN
√
where we choose λ = κ · n. It will be a universal algorithm allowing for individual estimates
on the respective d-variate periodic function f ∈ C(Td ) of interest. The vector z ∈ CN will
later represent the coefficients in an appropriate basis expansion. To be more precise, for
many random samples X = {x1 , ..., xm } it holds for 2 ≤ q ≤ ∞ and all f ∈ C(Td ) that
∥f − Rm,κ√n (f ; X)∥Lq ≤ Cn1/2−1/q σn (f ; T d )L∞ + E[−M,M ]d ∩Zd (f ; T d )L∞ (1.3)
with high probability. See Section 2 for the notational conventions. A function is recovered from
the vector y = f (X) := (f (x1 ), ..., f (xm ))T ∈ Cm of point evaluations at random nodes where
the set of nodes is fixed in advance and is used for all functions f simultaneously. The task
is to solve the (rLasso) optimization program (1.1) with respect to a randomly subsampled
Fourier matrix A for the coefficient vector z of the approximant Rm,κ√n (f ; X). Practical
considerations like well-posedness, stability, etc. for (rLasso) have been published recently by
Berk, Brugiapaglia and Hoheisel [5].
The situation is completely parallel for the greedy algorithm (OMP), see Definition 3.2 be-
low. The recovery guarantee (1.4) is an easy consequence of known results from compressed
sensing, see Foucart, Rauhut [16, Theorem 6.25] based on Zhang [40], Haviv, Regev [17] and
Brugiapaglia, Dirksen, Jung, Rauhut [7], together with the approach from [18]. The iterative
procedure used here is denoted with Pm,k indicating the k greedy steps and the m used samples.
The additional key feature is the fact that the approximant is always k-sparse. This property
is not present for (rLasso). The corresponding version of (1.3) is
∥f − Pm,24n (f ; X)∥Lq ≤ Cn1/2−1/q σn (f ; T d )L∞ + E[−M,M ]d ∩Zd (f ; T d )L∞ (1.4)
provided that (1.2) holds for the number of samples. Note, that recently Dai, Temlyakov [14]
considered weak orthogonal matching pursuit which involves an additional weakness parameter
in the greedy selection step. Remarkably, they obtained a similar control of the greedy ap-
proximation error in terms of the best n-term approximation which leads to analogous results.
Note, that there are various numerical implementations of (OMP), see, e.g., Kunis, Rauhut [24].
Their implementation is based on non-equispaced fast Fourier transform NFFT.
We put our findings into perspective to other contemporary sampling recovery methods such
as sparse grids (Smolyak) and linear methods based on least squares with respect to hyperbolic
crosses on subsampled random points (Lsqr), see Figure 2. Our results are collected in Figure
1 below which illustrates the regions in the (1/p, 1/q) parameter domain for our model scenario
on the d-torus, namely spaces with Lp -bounded mixed derivative, denoted with Wpr , where the
error is measured in Lq . The different methods are known to be optimal, close to optimal or
at least superior over others. As optimality measure we use the classical notion of sampling
numbers introduced in (2.1) below. The picture is only partially complete which in turn means
that there are a lot of open problems, where the reader is invited to contribute.
2
We consider mixed Wiener spaces Armix on the d-torus and function classes with bounded
mixed derivative Wpr as surveyed in Dũng, Temlyakov, T. Ullrich [12, Chapt. 2]. These spaces
have a relevant history in the former Soviet Union and serve as a powerful model for multivariate
approximation. Concretely, we study the situation Wpr in Lq where 1 < p ≤ 2 ≤ q and the case
of small smoothness 2 < p < ∞ and 1/p < r ≤ 1/2. We consider the worst-case setting where
the error is measured in Lq . It turned out in [18], see also Moeller, Stasyuk and the third named
author [28], that for several classical smoothness spaces non-linear recovery in L2 outperforms
any linear method (not only sampling). The results in this paper show that this effect partially
extends to Lq with 2 ≤ q < ∞. In fact, functions in mixed weighted Wiener spaces Armix
provide an intrinsic sparsity with respect to the trigonometric system such that the additional
gain in the rate does not seem to be a surprise. If r > 1/2 it holds for m ≳ Cr,d n log3 (n + 1)
that there is a non-linear recovery map Am based on (OMP) or (rLasso) using random points,
such that
sup ∥f − Am (f )∥Lq ≲ n−(r+1/q) (log(n + 1))(d−1)r+1/2
∥f ∥Ar ≤1
mix
with high probability. We determine a polynomial rate of convergence r + 1/q which is at least
sharp in the main rate (apart from logarithms) and outperforms any linear algorithm. The
situation is not so clear when studying Wpr classes in Lq . Surprisingly, in case 1 < p < 2 < q
and 1/p+1/q > 1 square root Lasso and orthogonal matching pursuit outperform any sampling
algorithm based upon sparse grids if d is large. The acceleration only happens in the logarithmic
term. We obtain for r > 1/p and m ≳ Cr,d n log3 (n + 1) a non-linear recovery map based on
(rLasso) and (OMP) using random samples such that
−(r− p1 + 1q ) (d−1)(r−2( p1 − 12 ))+ 12
sup ∥f − Am (f )∥Lq ≲ n (log(n + 1))
∥f ∥Wpr ≤1
with high probability. The result shows that for q = 2 and d large (rLasso) and (OMP) have
a faster asymptotic decay than any linear method and in particular (Lsqr). This effect has
been observed already for basis pursuit denosing in [18]. Note, that the described effects do
not appear when it comes to the uniform norm, i.e., q = ∞. This is a consequence of a general
result, described in Novak, Wozniakowski [32, Chapter 4.2.2], see also Remark 3.12.
The bound in (1.3) has the striking advantage that one may directly insert known bounds
from the literature, see [3, 38] and [12, Section 7] for an overview. Other approaches, like in
[20], require the embedding of the function class into the multivariate Wiener algebra A, which
1/2
is not always the case, not even for classical smoothness spaces like Sobolev spaces Wp for
p > 2. This non-trivial fact sharpens Bernstein’s classical result on the absolute convergence of
Fourier series from functions in Hölder-Zygmund spaces, see Zygmund [41, Theorem VI.3.1,page
240] and the references therein, and will be proven in a forthcoming paper by the authors.
Smolyak’s sparse grids [34] in connection with functions providing bounded mixed derivative
or difference have a significant history not only for approximation theory, see [35], [12] and the
references therein, but also in scientific computing, see Bungartz, Griebel [8]. The underlying
spaces do not only serve as a powerful model for multivariate approximation theory motivated
from practical problems, also sparse grid algorithms allow for good (and sometimes optimal)
approximation rates with significantly fewer sampling points. It is strongly related to hyperbolic
cross approximation. In Figure 1 we indicate the parameter regions, where (Smolyak) is known
to be optimal with respect to Gelfand/approximation numbers.
Finally, we would like to mention the recent developments in direction of least squares
methods (Lsqr). Beginning from the breakthrough result by Krieg, M. Ullrich [23], where it
was shown that sampling recovery for reproducing kernel Hilbert spaces in L2 is asymptotically
3
equally powerful as linear approximation, authors improved both, algorithms [29, 2] and error
guarantees [25], until the remaining logarithmical gap has been finally sealed by M. Dolbeault,
D. Krieg, M. Ullrich [15] for RKHS which are sufficiently compact in L2 . In Nagel, Schäfer, T.
Ullrich [29] subsampled random points appeared for the first time. The final solution [15] is
again heavily based on the solution of the Kadison-Singer problem [26], however, highly non-
constructive. As for the classical problem W2r in L2 (the midpoint in Figure 1) the algorithm
uses the basis functions from the hyperbolic cross (left most picture in Figure 2) with n fre-
quencies. The nodes on the spatial side result from a random draw (O(n log n)) together with
a subsampling to |X| = O(n) points (fourth picture in Figure 2). The resulting overdetermined
matrix is then used to recover the coefficients from the sample vector y = f (X) via a weighted
least squares algorithm. Apart from the Hilbert space setting, the situation Wpr in Lq has been
investigated in Krieg, Pozharska, M. Ullrich, T. Ullrich [21, 22].
1
q ϱn (Wpr , Lq )
Notation For a number a, by a+ we de-
note max{a, 0}, and by log(a) its natural log- 1
n
arithm. C shall denote the complex n-space
and Cm×n the set of complex m × n-matrices. Lsqr, rLasso
Vectors and matrices are usually typesetted OMP
N Smolyak
boldface. For a vector v ∈ C and a set (Remark 4.13)
(Remark 4.15)
S ⊂ {1, ..., N } we mean by vS ∈ CN the
1
restriction of v to S, where all other en- 2
tries ˆ
are set to zero. We denote by f (k) = rLasso, OMP
R (Rem. 4.8, 4.9)
Td f (x) exp(−2πik · x) dx the Fourier coeffi-
cient with respect to the frequency k ∈ Zd
and indicate by f ∈ T ([−M, M ]d ) that f is Smolyak
a trigonometric polynomial with support on (Remark 4.14)
the frequencies in the set [−M, M ]d ∩ Zd . The 1
p
notation Lq := Lq (Td ), 1 ≤ q < ∞, indi- 1 1
2
cates the classical Lebesgue space of periodic
functions on the d-torus Td = [0, 1]d with the
Figure 1: Magenta area: Comparison to Smolyak,
usual modifications for q = ∞. The notation optimality not clear. Orange area: Optimality:
C(Td ) stands for the space of continuous pe- Gelfand widths. Green area: Optimality: linear
riodic functions on Td with the sup-norm. All widths.
other function spaces of d-dimensional func-
tions will be typesetted boldface. With ℓp (N )
we denote the CN quasi-normed by ∥x∥p := ( N p 1/p . Let X and Y are two normed
P
k=1 |xk | )
spaces. The norm of an element x in X will be denoted by ∥x∥X . The symbol X ,→ Y indicates
that the identity operator from X to Y is continuous. For two sequences an and bn we will
write an ≲ bn if there exists a constant c > 0 such that an ≤ c bn for all n ∈ N. We will write
an ≍ bn if an ≲ bn and bn ≲ an . The involved constants do not depend on n but may depend
on other parameters.
4
Hyperolic cross Sparse grid Full grid Random+subsampling Random points
Figure 2: Hyperbolic cross in the frequency domain [−32, 32]2 ∩ Z2 , different sampling designs
in d = 2
f : Ω → C, which is continuously embedded into Y , are defined as follows. For m > 1 define
If one restricts to linear recovery operators R : Cm → Y , then the corresponding quantities are
denoted by ϱlinm (F)Y and λm (F)Y . In other words, we look for optimal linear operators with
rank not exceeding m, i.e.,
It is well known that linear algorithms are optimal if Y = L∞ (see Novak, Wozniakowski [32,
4.2.2]) and Remark 3.12.
Let I denote a countable index set and B = {bk ∈ C(Ω) : k ∈ I} a dictionary consisting of
continuous functions. Note that often the additional requirement is needed that the functions
in B are universally bounded in L∞ . For n ∈ N, we define the set of linear combinations of n
elements of B as
X
J
Σn := cj bj (·) : J ⊂ I, |J| ≤ n, (cj )j∈J ∈ C .
j∈J
Note that the set Σn is “non-linear” (not a vector space), whereas the space VJ is linear. When
dealing with Y = Lq (Ω, µ), 1 ≤ q ≤ ∞, for a Borel measure µ on Ω it is often desirable that
the bk (·) are pairwise orthogonal with respect to µ. We denote by
5
the corresponding width with respect to F. Let further
denote the linear best approximation error for f as well as for the entire class F
We may use an enumeration of [−D, D]d = {k1 , ..., kN } ⊂ Zd with N = (2D + 1)d and define
the enumerated multivariate Fourier system as ej (·) := exp(2πikj ·), j = 1, ..., N . We will write
√
A = 1/ m(ej (xℓ )1≤ℓ≤m,1≤j≤N .
The following specifically tailored version of the multivariate de la Valleé Poussin mean will
be of use. In our special setting the operator VM takes the place of the quasi-projection P
which has been used in [18, Sect. 3.1]. The key features of the construction below are that
the operator is the identity on T ([−M, M ]d ) and the fact that it has a universally bounded
operator norm from L∞ to L∞ with respect to M and d. Indeed, from [18, Sect. 3.1] we obtain
1 d
∥VM ∥L∞ →L∞ ≤ 1 + ≤ e, (2.2)
d
for the operator defined by
X
VM (f )(x) = fˆ(k) vk exp(2πik · x), (2.3)
k∈Zd
Qd
with weights vk = j=1 vkj satisfying
1,
|kj | ≤ M,
(2d+1)M −|kj |
vkj = 2dM , M < |kj | ≤ (2d + 1)M, (2.4)
0, |kj | > (2d + 1)M .
6
3 Instance optimal function recovery – guarantees
We will consider two different nonlinear decoders, square root Lasso (rLasso) and orthogonal
matching pursuit (OMP). As for the first one see H. Petersen, Jung [33] and the references therein.
The advantage of (rLasso) over basis pursuit denoising as used in [18] is the “noise blindness”
which results in the advantage that we do not have to incorporate additional information from
the function class f belongs to. This feature is also present for greedy methods such as (OMP),
see Foucart, Rauhut [16, 6.4] or Dai, Temlyakov [14, Paragraph after Cor. 1.2].
We will tailor square root Lasso and orthogonal matching pursuit to the function recovery
problem. For the general scenario described above, the decoder maps Rm,λ : C(Ω) → C(Ω)
and Pm,k : C(Ω) → C(Ω) are chosen in the following way. We fix a finite index set J and
X = {x1 , . . . , xm } ⊂ Ω.
√
A := 1/ m(bj (xℓ ))1≤ℓ≤m,j∈J ∈ Cm×|J| (3.1)
√
and y = f (X)/ m ∈ Cm .
Definition 3.1 (rLasso). Let λ > 0 and m ∈ N. Put
X
Rm,λ (f ; X) := (c# (y))j bj (·) ∈ VJ ⊂ L∞ , (3.2)
j∈J
where c# (y) ∈ C|J| is any (fixed) solution of the square root Lasso minimization problem
with respect to the matrix (3.1) and the vector of samples y ∈ Cm . This defines a (not neces-
sarily linear) map Rm,λ : C(Ω) → C(Ω). The parameter λ > 0 is chosen below and may depend
on other parameters.
Definition 3.2 (OMP). Let k ∈ N, J, A, X and y as above. Then
X
Pm,k (f ; X) := (ck (y))j bj (·) ∈ VJ ⊂ L∞ , (3.4)
j∈J
7
Definition 3.3 (ℓq -robust null space property). Given 1 ≤ q < ∞, m, N ∈ N and ∥ · ∥ a norm
on Cm , the matrix A ∈ Cm×N satifies the ℓq -robust null space property of order n < N if there
exist constants 0 < ϱ < 1 and τ > 0 such that for all v ∈ CN and all S ⊂ [N ] with |S| ≤ n
We will use this property in the following proposition which is a direct consequence of H.
Petersen and P. Jung [33, Theorem 3.1].
Proposition 3.4. Let A ∈ Cm×N be a matrix satisfying the ℓ2 -robust null space of order n in
the form
∥vS ∥ℓ2 ≤ ϱn−1/2 ∥vS c ∥1 + τ ∥A · v∥2 .
Then there is constant κ > 0 (depending only on τ ) such that for any y ∈ Cm and c ∈ CN a
solution c# ∈ CN of the (rLasso) minimization problem
√
min ∥z∥ℓ1 (N ) + κ n∥Az − y∥ℓ2 (m) (3.7)
z∈CN
satisfies √
∥c − c# ∥ℓ1 ≤ βσn (c)ℓ1 + δ n · ∥Ac − y∥ℓ2 (3.8)
and
σn (c)ℓ1
∥c − c# ∥ℓ2 ≤ β √ + δ · ∥Ac − y∥ℓ2 , (3.9)
n
where
σn (c)ℓ1 := inf ∥c − z∥ℓ1 ,
z∈CN ,∥z∥ℓ0 ≤n
with ∥z∥ℓ0 := |{1 ≤ j ≤ N : zj ̸= 0}|. The constants β, δ > 0 only depend on ϱ and τ .
√
Proof. Theorem [33, Theorem 3.1] says we may choose λ > τ n in the optimization program
to obtain (3.9). Clearly, the ℓ2 -robust null space property w.r.t. the Euclidean norm ∥·∥2 implies
the ℓ1 -robust null space property with modified norm ∥ · ∥ = n1/2 ∥ · ∥ℓ2 (m) . Again Theorem [33,
Theorem 3.1] says that we may choose λ > τ in the modified optimization problem
to obtain
√
∥c − c# ∥ℓ1 ≤ βσn (c)ℓ1 + δ · ∥Ac − y∥ = βσn (c)ℓ1 + δ n · ∥Ac − y∥ℓ2 ,
which is (3.8). Hence, (3.7) works for q = 1 and q = 2 simultaneously and yields the bounds
(3.9) and (3.8).
Theorem 3.5. There exist universal constants α, β, γ, δ, κ > 0 such that the following holds
true. Let D ∈ N, N = (2D + 1)d and n, m ∈ N satisfy
8
following holds: Given c ∈ CN and y ∈ Cm , and a solution c# ∈ CN of the (rLasso) mini-
mization problem √
min ∥z∥ℓ1 (N ) + κ n∥Az − y∥ℓ2 (m) (3.11)
z∈CN
then √
∥c − c# ∥ℓ1 ≤ βσn (c)ℓ1 + δ n · ∥Ac − y∥ℓ2 (3.12)
and
σn (c)ℓ1
∥c − c# ∥ℓ2 ≤ β √ + δ · ∥Ac − y∥ℓ2 . (3.13)
n
Note, that since N ≥ 2, the number 1 − N −γ log(n+1) and therefore also the probability of
choosing a vector of “good” sampling points X = {x1 , . . . , xm } is close to 1.
Proof. Choosing α large enough in (3.10) ensures that A has RIP of order 2n with RIP constant
δ2n < 1/3 with the mentioned probability, see [17, Theorem 3.7]. In fact, it holds for all c ∈ CN
with ∥c∥0 ≤ 2n that
(1 − δ2n )∥c∥22 ≤ ∥A · c∥22 ≤ (1 + δ2n )∥c∥22 . (3.14)
By Theorem 3.6 below we have that A then provides ℓ2 -robust null space property (NSP) of
order n with constants ϱ, τ depending only on δ2n < 1/3. Finally, we apply Proposition 3.4
and conclude the proof.
Theorem 3.6 (RIP implies robust NSP). For A ∈ Cm×N assume that A satisfies RIP with
δ2n < 13 , see (3.14). Then A satisfies the ℓ2 -robust null space property (NSP) of order n, i.e.
ρ
∥vS ∥2 ≤ √ ∥vS C ∥1 + τ ∥Av∥2 ∀v ∈ CN , ∀S ⊂ [N ], S ≤ n, (3.15)
n
Proof. Let v ∈ CN . For v ∈ ker A \ {0}, it is enough to consider S = Jn (v) (index set of largest
entries of v in absolute value). We partition [N ] into the index sets
S0 := S = Jn (v)
S1 := J2n (v) \ Jn (v)
S2 := J3n (v) \ J2n (v)
..
.
and note that it is enough to show the ℓ2 robust NSP for S0 = S = Jn (v). We estimate as
follows.
1 1 D X E
∥vS ∥22 ≤ ∥AvS ∥22 = AvS0 , Av − AvSk
1 − δn 1 − δn
k≥1
1 X
≤ |⟨AvS0 , Av⟩| + |⟨AvS0 , AvSk ⟩|
1 − δn
k≥1
1 X
≤ ∥AvS0 ∥2 ∥Av∥2 + δ2n ∥vS0 ∥2 ∥vSk ∥2
1 − δn
k≥1
1 p 1 X
≤ δn + 1∥vS0 ∥2 ∥Av∥2 + δ2n ∥vS0 ∥2 · √ ∥vSk−1 ∥1 .
1 − δn n
k≥0
9
After division by ∥vS ∥2 and Hölder’s inequality, this yields
√
1 + δ2n δ2n 1
∥vS ∥2 ≤ ∥Av∥2 + √ ∥vS ∥1 + ∥vS C ∥1 .
1 − δ2n 1 − δ2n n
Theorem 3.7. There exist universal constants C, α, κ, γ > 0 such that the following holds for
M, n ∈ N and put D := (2d + 1)M . Drawing at least
where Rm,κ√n denotes (rLasso) decoder from Definition 3.1 such that the approximant is con-
tained in the space of trigonometric polynomials T ([−(2d + 1)M, (2d + 1)M ]d ).
Proof. To prove Theorem 3.7 for 2 ≤ q ≤ ∞ we first get the L∞ -bound for the worst-case error
and combine it via interpolation with the L2 -bound.
For the L∞ -bound we will use the control over ∥c − c# ∥ℓ1 in Theorem 3.5, whereas the
control on ∥c − c# ∥ℓ2 serves for the L2 bound. Let ε > 0. Take an arbitrary f ∈ C(Td ) and
let f ∗ = VM s, for s such that ∥f − s∥L∞ ≤ σn (f ; T d )L∞ + ε. The coefficient vector c of f ∗ is
√ √
n-sparse. We also set y = f (X)/ m and e = (f (X) − f ∗ (X))/ m. Hence ∥A · c − y∥2 =
∥e∥ℓ2 ≤ ∥f (X) − f ∗ (X)∥ℓ∞ . Then, taking into account the boundedness of the Fourier system
(see Section , we have from Theorem 3.5
N
(cj − c#
X
∥f ∗ − Rm,λ f ; X)∥L∞ ≤ #
j (y)) ∥ej (·)∥L∞ ≤ ∥c − c ∥ℓ1
j=0
√ (3.16)
≤ βσn (c)1 + δ · n∥A · c − y∥ℓ2
√
≤ δ · n∥f (X) − f ∗ (X)∥ℓ∞ .
10
It remains to verify the second estimate in (3.17) which is a standard computation. We decided
to provide the short proof for the convenience of the reader. Let g ∈ T [−M, M ]d denote an
arbitrary trigonometric polynomial. Clearly, VM g = g and therefore,
∥f − VM f ∥L∞ = ∥f − g + g − VM f ∥L∞ = ∥f − g − VM (f − g)∥L∞
(3.19)
≤ ∥f − g∥L∞ + ∥VM (f − g)∥L∞ ≤ (1 + e)∥f − g∥L∞ .
1−θ
∥f − Rm,λ f ; X)∥Lq ≤ ∥f − Rm,λ f ; X)∥L2
∥f − Rm,λ f ; X)∥θL∞
2/q 1−2/q
= ∥f − Rm,λ f ; X)∥L2 ∥f − Rm,λ f ; X)∥L∞ ,
where the interpolation parameter θ has to be chosen in such a way that 1/q = (1 − θ)/2 + θ/∞
which yields θ = 1 − 2/q. This concludes the proof.
11
Proof. We combine [17, Theorem 3.7] and [16, Theorem 6.25]. Precisely, choosing α large
enough in (3.22) ensures that A has RIP of order 13n with RIP-constant δ13n < 1/6, see (3.14)
and [17, Theorem 3.7]. This is required in [16, Theorem 6.25] to guarantee the recovery bounds
(3.23), (3.24).
where Pm,k denotes the (OMP) decoder from Definition 3.2 after k iterations.
Proof. The proof is completely analogous to the proof of Theorem 3.7. This time we use
Proposition 3.8 instead of Theorem 3.5.
Corollary 3.10. Let F ,→ C(Td ) denote a function class compactly embedded into the space
of continuous functions on the d-torus. Let further d, m, n, M ∈ N such that M ≥ d and
where the quantity ϱm (F)Lq denotes the m-th sampling width and is defined in (2.1) . The
constants C, α are inherited from either Theorem 3.9 or Theorem 3.7.
Remark 3.11. Similar as in Krieg [20, Lemma 9] one may prove a version of the above result
which looks as follows. Under the condition (1.2) we have
where Rm denotes both, the (rLasso) and the (OMP) decoders Rm,κ√n and Pm,24n , respectively.
Note, that this version differs from the one in [20, Lemma 9], since the author does not use the
L∞ -best approximation on the right-hand side. Let us emphasize that the bounds in Theorems
3.7 and 3.9 have the advantage that one can directly insert known bounds for L∞ widths without
relying on the embedding into the Wiener algebra A, see Temlyakov [36] and Temlyakov, T.
Ullrich [38], as well as Dũng, Temlyakov, T. Ullrich [12, Chapt. 4, 7]. This is relevant in
situations, when the function class is not embedded into A, as, for instance, the space Wpr with
p > 2 and 1/p < r ≤ 1/2. Indeed, this space is compactly embedded into C(Td ) but not in A,
which will be proved in a forthcoming paper by the authors.
Remark 3.12 (Linear recovery in L∞ ). Note, that taking into account the fact (see [32, Chapter
4.2.2], also [11]), that if the target space is Y = L∞ (as in our case), and one established an
estimate for the non-linear sampling numbers, then there exists a linear algorithm with the same
error bound. However, we only know the of the existence of such an algorithm, without any
deterministic construction.
12
4 Examples
We will now discuss examples where Theorems 3.7 and Theorem 3.9 improve existing results
in certain directions. We start in Subsection 4.1 with the mixed Wiener spaces Armix , a gener-
alization of the classical Wiener algebra A. These have been studied a lot due to their good
embedding properties and their connection to Barron classes. Recent work on these spaces
and their approximation properties by Jahn, T. Ullrich and Voigtlaender [18]; Kolomoitsev,
Lomako, Tikhonov [19]; Krieg [20]; Moeller [27]; Moeller, Stasyuk and T. Ullrich [28]; V.K.
Nguyen, V.N. Nguyen and Sickel [30] and others.
Definition 4.1. For r ≥ 0 we define the mixed Wiener space Armix of functions f ∈ L1 (Td )
with the finite norm
XY d
∥f ∥Armix := (1 + |ki |)r |fˆ(k)|,
k∈Zd i=1
where fˆ(k) are the respective Fourier coefficients. For the univariate case we use the notation
Ar , since the smoothness is not more mixed. In the case r = 0, we get the Wiener algebra, that
will be denoted in what follows by A.
In Subsection 4.2 we investigate how and in which cases the (rLasso) can beat linear
algorithms for spaces of functions with bounded mixed derivative defined in the following way.
Define for x ∈ T and r > 0 the univariate Bernoulli kernel
∞
X X
Fr (x) := 1 + 2 k −r cos(2πkx) = max{1, |k|}−r exp(2πikx)
k=1 k∈Z
Qd
and define the multivariate Bernoulli kernels as Fr (x) := j=1 Fr (xj ), x ∈ Td .
Definition 4.2. Let r > 0 and 1 < p < ∞. Then Wpr is defined as the normed space of all
elements f ∈ Lp (Td ) which can be written as
Z
f = Fr ∗ φ := Fr (· − y)φ(y) dy
Td
for some φ ∈ Lp (Td ), equipped with the norm ∥f ∥Wpr := ∥φ∥Lp (Td ) .
In order to prove the statements, we will use embeddings of Armix and Wpr into the Besov
spaces Brp,θ of functions with bounded mixed differences.
Definition 4.3. Let r ≥ 0, 1 ≤ θ ≤ ∞, 1 < p < ∞. Then the periodic Besov space Brp,θ with
mixed smoothness is defined as the normed space of all elements f ∈ Lp (Td ), endowed with the
norm (with the usual modifications if θ = ∞)
X X θ 1/θ
∥f ∥Brp,θ := 2|s|1 rθ fˆ(k) exp(2πik · x) , 1 ≤ θ < ∞,
p
s∈Nd0 k∈ρ(s)
where n o
ρ(s) := k ∈ Zd : ⌊2sj −1 ⌋ ≤ |kj | < 2sj , j = 1, . . . , d , s ∈ Nd0 . (4.1)
13
4.1 Recovery of functions belonging to mixed weighted Wiener spaces
Corollary 4.4. Let r > 1/2 and 2 ≤ q ≤ ∞. Let further d, n ∈ N and m > Cr,d n log3 (n + 1)
with an appropriate constant Cr,d > 0 then there is a non-linear recovery operator Am based on
(rLasso) or (OMP) using m random samples such that with high probability
sup ∥f − Am (f )∥Lq ≲ n−(r+1/q) (log(n + 1))(d−1)r+1/2 . (4.2)
∥f ∥Ar ≤1
mix
Proof. Using [18, Lemma 4.3] and choosing M := ⌊n(r+1/2)/r ⌋ we obtain (4.2) as a direct
consequence of Theorems 3.7, 3.9.
Remark 4.5. The upper bound in Corollary 4.4 is sharp in the main order, which even co-
incides with those for the Gelfand width. One can show this by using the good embedding
properties of Wiener spaces and an exact order estimates for the Gelfand widths of the Besov
spaces embeddings by Vybiral [39, Theorem 4.12]. Indeed,
r+1/2 r+1/2
ϱn (Armix (Td ))Lq ≥ ϱn (Ar (T))Lq ≥ ϱn (B2,1 )Lq ≥ ϱn (B2,1 )Bq,∞
0
r+1/2
(4.3)
≥ cn (B2,1 )Bq,∞
0 ≍ n−(r+1/q) .
In the first line we retreat to the one-dimensional setting.
Remark 4.6 (Nonlinearity helps for Armix ). If we compare this upper bound for non-linear
approximation to lower bounds for linear approximation we can show how much better non-
linear approximation is compared to linear approximation. Indeed [30, Theorem 4.7] states (in
our notation, putting r = 1, s = r) that, for r > 0
−r
ϱlin r d
n (Amix (T ))Lq ≥ n log(n)(d−1)r .
We have that the maximal possible difference in the rates is attained for q = 2 and the same
main rate for q = ∞ when comparing linear and non-linear approximation of mixed Wiener
spaces in Lq spaces, since the difference between rates is always 1/q.
The sharp upper bounds for a linear recovery from samples in a more general setting, in
particular for the worst-case errors of recovery of functions from the weighted Wiener spaces
by the Smolyak algorithm, were obtained in [19], see e.g., Theorem 5.1 and Remark 6.4. In
[21, Corollary 23] the upper bounds were proved for an algorithm that uses subsampled random
points that are sharp in the case q = 2, see also [21, Remark 24] for the comparison with the
Smolyak algorithm.
14
Corollary 4.7 (Lower right region Wpr ). Let 1 < p ≤ 2 ≤ q ≤ ∞ and r > 1/p. Let further
d, n ∈ N. Then there is a constant Cr,p,d > 0 such that for
m > Cr,p,d n log3 (n + 1)
there is a non-linear recovery operator Am based on (rLasso) or (OMP) using m random samples
such that with high probability the following asymptotic bound holds
−(r− p1 + 1q ) (d−1)(r−2( p1 − 12 ))+ 12
sup ∥f − Am (f )∥Lq ≲ n (log(n + 1)) . (4.4)
∥f ∥Wpr ≤1
Proof. From Theorems 3.7, 3.9 and the arguments from the proof of Corollary 4.14 in [18] (the
class Wpr is the same as Spr W (Td ) in their notation) we choose M a dyadic number satisfying
−1 −1
n2r(r−1/p) ≤ M ≤ 2n2r(r−1/p) . The corresponding (rLasso) or (OMP) decoder associated
to this M , which uses m random samples, guarantees
sup ∥f − Am (f )∥Lq ≲ n1/2−1/q · σn (Wpr ; T d )L∞ + E[−M,M d ]∩Zd (f ; T d )L∞ ),
∥f ∥Wpr ≤1 ≤
where we used the upper bound for the best n-term trigonometric approximation from [36,
Thm. 2.9] to balanced both terms by the choice of M . This yields (4.4).
Remark 4.8 (Main rate sharp in Corollary 4.7). One can show the sharpness of the main
rate of convergence in Corollary 4.7 using the fooling argument from [31, Theorem 23] (for
d = 1). Actually, the main rate n−(r−(1/p−1/q)) is optimal for both linear and nonlinear sampling
recovery.
Note that in the region 1 < p < 2 < q < ∞, the recovery from arbitrary linear information
of functions from the class Wpr in Lq always outperform (also non-linear) sampling recovery in
the main rate. i.e., λn (Wpr )Lq = o(ϱn (Wpr )Lq ).
Interestingly in the case 1/p + 1/q > 1 the Gelfand widths cn (Wpr )Lq decay faster in the
main rate than the respective linear widths λn (Wpr )Lq . For 1/p + 1/q ≤ 1 it holds cn (Wpr )Lq ≍
λn (Wpr )Lq .
Remark 4.9. Let us compare the bound for (rLasso) and (OMP)and from Corollary 4.7 with
those for other recovery methods. Here we assume that 1 < p < 2 < q < ∞, the case q = 2 will
be discussed separately in Remark 4.10 below.
(i) [Comparison to (Smolyak)] In the paper [9, Cor. 7.1] an upper bound for the linear
sampling numbers of Wpr (the same as Sp,2r F (Td ) with µ = d in their notation) in L has been
q
given for the worst-case recovery using the linear Smolyak algorithm Sn,d , which for r > 1/p,
1 < p < q < ∞ yields that
−(r− p1 + 1q ) (d−1)(r− p1 + 1q )
sup ∥f − Sn,d (f )∥Lq ≲ n (log n) . (4.5)
∥f ∥Wpr ≤1
By the embedding Brp,p ,→ Wpr in case 1 < p < 2 < q < ∞ together with [13, Thm. 5.1,(ii)] we
know that we can not do better in Lq as in (4.5) if we restrict to sparse grid (Smolyak) points.
Hence, our non-linear approach outperforms sparse grids if d is large and
2(1/p − 1/2) > 1/p − 1/q ⇐⇒ 1/p + 1/q > 1 .
(ii) [Comparison to (Lsqr)] In [22, Cor. 21] we obtain (4.5) for 1 < p < 2 < q < ∞ also with
a different linear method, namely plain least squares estimator based on subsampled random
points involving the solution of the Kadison-Singer problem [26]. We do not know if the bound
given there is sharp and whether it may outperform (rLasso) or (OMP).
15
Remark 4.10 (L2 -estimates outperform any linear method). From Corollary 4.7 we obtain
the following important special case for q = 2, see also [18, Corollary 4.16],
−r+ p1 − 21 (d−1)(r− p1 + 12 ) 1
−(d−1)( p1 − 12 )
sup ∥f − Am (f )∥L2 ≲ n (log(n + 1)) (log(n + 1)) 2 .
∥f ∥Wpr ≤1
As mentioned in [18, Remark 4.17], for sufficiently large d the non-linear sampling numbers
decay faster in this situation than the respective linear widths, which coincide in the order of
decay with the linear sampling numbers.
Let us proceed with the case p > 2.
Corollary 4.11 (Left region including small smoothness). Let 2 ≤ p < ∞, 1 ≤ q < ∞. Then
there is a constant Cr,p,d > 0 such that with m = ⌈Cr,p,d n log3 (n + 1)⌉
−(r−( 12 − 1q )+ )
n (log(n + 1))(d−1)(1−r)+r , 1/p < r < 1/2,
1 1
r −(r−( 2 − q )+ ) (d−1)(1−r)+r r+1
ϱm (Wp )Lq ≲ n (log(n + 1)) (log log n) , r = 1/2, (4.6)
n−(r−( 21 − 1q )+ ) (log n)(d−1)r+ 21 ,
r > 1/2.
Proof. Since ∥·∥Lq ≤ ∥·∥L2 for q ≤ 2, it suffices to consider the case 2 ≤ q < ∞. Further, in order
to employ Theorems 3.7, 3.9, we need upper estimates for the quantities σn (Wpr ; T d )L∞ and
E[−M,M ]d ∩Zd (Wpr ; T d )L∞ . The rate of convergence of the respective best n-term approximation
width for 2 ≤ p < ∞ is
−r (d−1)(1−r)+r , 1/p < r < 1/2,
n (log n)
σn (Wpr ; T d )L∞ ≲ n−r (log n)(d−1)(1−r)+r (log log n)r+1 , r = 1/2, (4.7)
−r
n (log n) (d−1)r+1/2 , r > 1/2.
The case of small smoothness is known from [38, Theorems 6.1, 6.2], the big smoothness case
is taken from [36, Theorem 1.3], see also [12, Theorem 7.5.2].
In what follows we show that for an appropriately chosen M = M (n, r, p), the quantity
E[−M,M ]d ∩Zd (Wpr ; T d )L∞ decays faster than the respective best n-term approximation, see
Lemma 4.12 below.
Hence, Theorems 3.7, 3.9 yield the estimate
ϱ⌈Cr,p,d n log3 (n+1)⌉ (Wpr )Lq ≤ 2n1/2−1/q · σn (Wpr ; T d )L∞ .
16
where the blocks ρ(s) are defined in (4.1).
In what follows we use the Hölder’s inequality and obtain
−|s| (r− 1 )
X X
sup 2 1 p fˆ(k) exp(2πik · x) 2r|s|1
∥f ∥Brp,p ≤1 p
s∈Nd0 ,∃sj : 2sj >M k∈ρ(s)
1− 1
p
−(r− p1 ) −|s|1 (r− p1 )(1− p1 ) −(r− p1 )
X
≤M 2 sup ∥f ∥Brp,p ≲ M .
∥f ∥Brp,p ≤1
s∈Nd0
−1 −1
Choosing the parameter M such that n2r(r−1/p) ≤ M ≤ 2n2r(r−1/p) implies (4.8).
Remark 4.13 (Left upper region – almost sharp). (i) For 1 < q < 2 < p < ∞, the order of
Gelfand widths is cn (Wpr )Lq ≍ λn (Wpr )Lq ≍ n−r (log n)(d−1)r (see e.g. [12, Section 9.6]). With
(rLasso) and (OMP) we obtain the same main rate but additional (d-independent) logarithms,
i.e. it is almost optimal w.r.t. Gelfand widths.
(ii) (Comparison to (Lsqr) and (Smolyak)) The sharp (w.r.t. Gelfand numbers) bound for
(Lsqr) in the case 1 < q < 2 < p < ∞ was obtained in [22, Cor. 21]. Note that the approach
in [22] required a square summability of linear width, and cover only the case r > 1/2, whereas
in [21] and [37] this condition can be avoided by paying a d-independent logarithm.
(iii) In this region 1 < q < 2 < p < ∞, the right order for (Smolyak) behaves as
n−r (log n)(d−1)(r+1/2) (see [12, Thm. 5.3.1] and references therein). In fact, by the embed-
ding Brp,2 ,→ Wpr together with [13, Thm. 5.1,(ii)] we know that we can not do better in Lq if
we restrict to sparse grid points. This estimate is worse in logarithms to the power d than those
for (rLasso) and (OMP) for large d.
Remark 4.14 (Left lower region – Smolyak is optimal). We will further distinguish two cases:
2 < p < q < ∞ (lower triangular) and 2 < q < p < ∞ (upper triangular).
(i) (Lower triangular) In this region we know the exact (w.r.t. Gelfand linear) order (4.5)
for (Smolyak) from the paper [9, Cor. 7.1], that is better in the main rate than the bound for
(rLasso) and (OMP) (which is in turn better than (Lsqr) from [22, Cor. 21]). Note, that for
p = 2 < q ≤ ∞ (Lsqr) gives the same (sharp) order of decay as (Smolyak).
(ii) (Upper triangular) For 2 < q < p < ∞ we do not know anything about the optimality of
(linear and non-linear) sampling algorithms. (rLasso) gives worse in the main rate estimated
than Gelfand widths, in turn the existing upper bounds for (Smolyak) and (Lsqr) are worse
than those for the linear widths. Note that in this region Gelfand numbers decay faster than
linear widths in the main rate.
Remark 4.15 (Right upper region – Smolyak is optimal among linear methods). The region
1 < p, q < 2 consists of two triangular areas: 1 < p < q < 2 (lower triangular) and 1 < q ≤
p < 2 (upper triangular). For 1 < p < q < 2, the bound for (Smolyak) [9, Cor. 7.1] coincides
with those for the linear widths. In the case 1 < q ≤ p < 2 we cannot say anything about the
optimality w.r.t. neither linear nor Gelfand widths.
Acknowledgement. The first named author MM is supported by the ESF, being co-financed
by the European Union and from tax revenues on the basis of the budget adopted by the
Saxonian State Parliament. KP would like to acknowledge support by the Philipp Schwartz
Fellowship of the Alexander von Humboldt Foundation and the German Research Foundations
(DFG 403/4-1). KP and TU would like to thank Ben Adcock for pointing out reference [7]
and bringing square root Lasso as a noise blind alternative to the authors’ attention during a
discussion within the Session Function recovery and discretization problems organized by David
Krieg and KP at the conference MCQMC24 in Waterloo (CA).
17
References
[1] B. Adcock, A. Bao, and S. Brugiapaglia. Correcting for unknown errors in sparse high-
dimensional function approximation. Numer. Math., 142(3):667–711, 2019.
[2] F. Bartel, M. Schäfer, and T. Ullrich. Constructive subsampling of finite frames with
applications in optimal function recovery. Appl. Comput. Harmon. Anal., 65:209–248,
2023.
[3] É. S. Belinskii. Approximation of functions of several variables by trigonometric polyno-
mials with given number of harmonics, and estimates of ϵ-entropy. Analysis Mathematica,
15:67–74, 1989.
[4] A. Belloni, V. Chernozhukov, and L. Wang. Square-root lasso: pivotal recovery of sparse
signals via conic programming. Biometrika, 98(4):791–806, 2011.
[5] A. Berk, S. Brugiapaglia, and T. Hoheisel. Square root LASSO: Well-posedness, Lipschitz
stability and the tuning trade off. SIAM Journal on Optimization, 34(3):2609–2637, 2024.
[6] J. Bourgain. An improved estimate in the restricted isometry problem. In Geometric as-
pects of functional analysis, volume 2116 of Lecture Notes in Math., pages 65–70. Springer,
Cham, 2014.
[7] S. Brugiapaglia, S. Dirksen, H. C. Jung, and H. Rauhut. Sparse recovery in bounded Riesz
systems with applications to numerical methods for pdes. Applied and Computational
Harmonic Analysis, 53:231–269, 2021.
[8] H.-J. Bungartz and M. Griebel. Sparse grids. Acta Numerica, 13:147–269, 2004.
[9] G. Byrenheid and T. Ullrich. Optimal sampling recovery of mixed order Sobolev em-
beddings via discrete Littlewood–Paley type characterizations. Analysis Mathematica,
43:807–820, 2017.
[10] A. Cohen, W. Dahmen, and R. DeVore. Compressed sensing and best k-term approxima-
tion. American Mathematical Society, 22:211–231, 2009.
[11] J. Creutzig and P. Wojtaszczyk. Linear vs. nonlinear algorithms for linear problems.
Journal of Complexity, 20(6):807–820, 2004.
[12] D. Dũng, V. N. Temlyakov, and T. Ullrich. Hyperbolic cross approximation. Advanced
Courses in Mathematics. CRM Barcelona. Birkhäuser/Springer, Cham, 2018. Edited and
with a foreword by Sergey Tikhonov.
[13] D. Dũng and T. Ullrich. Lower bounds for the integration error for multivariate functions
with mixed smoothness and optimal Fibonacci cubature for functions on the square. Math.
Nachr., 288(7):743–762, 2015.
[14] F. Dai and V. N. Temlyakov. Random points are good for universal discretization. J.
Math. Anal. Appl., 529(1):Paper No. 127570, 28, 2024.
[15] M. Dolbeault, D. Krieg, and M. Ullrich. A sharp upper bound for sampling numbers in
L2 . Appl. Comput. Harmon. Anal., 63:113–134, 2023.
[16] S. Foucart and H. Rauhut. A mathematical introduction to compressive sensing. Applied
and Numerical Harmonic Analysis. Birkhäuser/Springer, New York, 2013.
18
[17] I. Haviv and O. Regev. The restricted isometry property of subsampled Fourier matrices.
In Geometric Aspects of Functional Analysis: Israel Seminar (GAFA) 2014–2016, pages
163–179. Springer, 2017.
[18] T. Jahn, T. Ullrich, and F. Voigtlaender. Sampling numbers of smoothness classes via
ℓ1 -minimization. Journal of Complexity, 79:Paper No. 101786, 35, 2023.
[20] D. Krieg. Tractability of sampling recovery on unweighted function classes. Proc. Amer.
Math. Soc. Ser. B, 11:115–125, 2024.
[21] D. Krieg, K. Pozharska, M. Ullrich, and T. Ullrich. Sampling projections in the uniform
norm. arXiv:math/2401.02220, 2024.
[22] D. Krieg, K. Pozharska, M. Ullrich, and T. Ullrich. Sampling recovery in L2 and other
norms. arXiv:math/2305.07539, 2024.
[23] D. Krieg and M. Ullrich. Function values are enough for L2 -approximation. Found. Com-
put. Math., 21(4):1141–1151, 2021.
[24] S. Kunis and H. Rauhut. Random sampling of sparse trigonometric polynomials. II. Or-
thogonal matching pursuit versus basis pursuit. Found. Comput. Math., 8(6):737–763,
2008.
[26] A. W. Marcus, D. A. Spielman, and N. Srivastava. Interlacing families II: Mixed charac-
teristic polynomials and the Kadison-Singer problem. Ann. of Math. (2), 182(1):327–350,
2015.
[27] M. Moeller. Gelfand numbers and best m-term trigonometric approximation for weighted
mixed Wiener classes in L2 . Master’s thesis, TU Chemnitz, Germany, 2023.
[29] N. Nagel, M. Schäfer, and T. Ullrich. A new upper bound for sampling numbers. Found.
Comput. Math., 22(2):445–468, 2022.
[31] E. Novak and H. Triebel. Function spaces in Lipschitz domains and optimal rates of
convergence for sampling. Constructive Approximation, 23:325–350, 2005.
[33] H. B. Petersen and P. Jung. Robust instance-optimal recovery of sparse signals at unknown
noise levels. Inf. Inference, 11(3):845–887, 2022.
19
[34] S. A. Smolyak. Quadrature and interpolation formulas for tensor products of certain classes
of functions. Dokl. Akad. Nauk SSSR, 148:1042–1045, 1963.
[37] V. N. Temlyakov and T. Ullrich. Bounds on Kolmogorov widths and sampling recovery
for classes with small mixed smoothness. Journal of Complexity, 67:101575, 2021.
[38] V. N. Temlyakov and T. Ullrich. Approximation of functions with small mixed smoothness
in the uniform norm. Journal of Approximation Theory, 277:105718, 2022.
[40] T. Zhang. Sparse recovery with orthogonal matching pursuit under RIP. IEEE Trans.
Inform. Theory, 57(9):6215–6221, 2011.
[41] A. Zygmund. Trigonometric series. Vol. I, II. Cambridge Mathematical Library. Cam-
bridge University Press, Cambridge, 2002.
20