Survey Compressed Sensing
Survey Compressed Sensing
Key words Dimension reduction, frames, greedy algorithms, ill-posed inverse problems, ℓ1
minimization, random matrices, sparse approximation, sparse recovery
MSC (2000) 94A12, 65F22, 94A20, 68U10, 90C25, 15B52
Compressed sensing is a novel research area, which was introduced in 2006, and since then
has already become a key concept in various areas of applied mathematics, computer science,
and electrical engineering. It surprisingly predicts that high-dimensional signals, which allow
a sparse representation by a suitable basis or, more generally, a frame, can be recovered from
what was previously considered highly incomplete linear measurements by using efficient
algorithms. This article shall serve as an introduction to and a survey about compressed
sensing.
1 Introduction
The area of compressed sensing was initiated in 2006 by two groundbreaking papers, namely
[18] by Donoho and [11] by Candès, Romberg, and Tao. Nowadays, after only 6 years,
an abundance of theoretical aspects of compressed sensing are explored in more than 1000
articles. Moreover, this methodology is to date extensively utilized by applied mathemati-
cians, computer scientists, and engineers for a variety of applications in astronomy, biology,
medicine, radar, and seismology, to name a few.
The key idea of compressed sensing is to recover a sparse signal from very few non-
adaptive, linear measurements by convex optimization. Taking a different viewpoint, it con-
cerns the exact recovery of a high-dimensional sparse vector after a dimension reduction step.
From a yet another standpoint, we can regard the problem as computing a sparse coefficient
vector for a signal with respect to an overcomplete system. The theoretical foundation of
compressed sensing has links with and also explores methodologies from various other fields
such as, for example, applied harmonic analysis, frame theory, geometric functional analysis,
numerical linear algebra, optimization theory, and random matrix theory.
It is interesting to notice that this development – the problem of sparse recovery – can in
fact be traced back to earlier papers from the 90s such as [24] and later the prominent papers
∗ E-mail: [email protected], Phone: +49 30 314 25758, Fax: +49 30 314 21604
by Donoho and Huo [21] and Donoho and Elad [19]. When the previously mentioned two
fundamental papers introducing compressed sensing were published, the term ‘compressed
sensing’ was initially utilized for random sensing matrices, since those allow for a minimal
number of non-adaptive, linear measurements. Nowadays, the terminology ‘compressed sens-
ing’ is more and more often used interchangeably with ‘sparse recovery’ in general, which is
a viewpoint we will also take in this survey paper.
kxk0 := #{i : xi 6= 0}
is small, or that there exists an orthonormal basis or a frame1 Φ such that x = Φc with c
being sparse. For this, we let Φ be the matrix with the elements of the orthonormal basis
or the frame as column vectors. In fact, a frame typically provides more flexibility than an
orthonormal basis due to its redundancy and hence leads to improved sparsifying properties,
hence in this setting customarily frames are more often employed than orthonormal bases.
Sometimes the notion of sparsity is weakened, which we for now – before we will make this
precise in Section 2 – will refer to as approximately sparse. Further, let A be an m × n matrix,
which is typically called sensing matrix or measurement matrix. Throughout we will always
assume that m < n and that A does not possess any zero columns, even if not explicitly
mentioned.
Then the Compressed Sensing Problem can be formulated as follows: Recover x from
knowledge of
y = Ax,
y = AΦc.
In both cases, we face an underdetermined linear system of equations with sparsity as prior
information about the vector to be recovered – we do not however know the support, since
then the solution could be trivially obtained.
This leads us to the following questions:
• What are suitable signal and sparsity models?
• How, when, and with how much accuracy can the signal be algorithmically recovered?
• What are suitable sensing matrices?
In this section, we will discuss these questions briefly to build up intuition for the subsequent
sections.
1 Recall that a frame for a Hilbert space H is a system (ϕ )
P i i∈I in H, for which there exist frame bounds
0 < A ≤ B < ∞ such that Akxk22 ≤ i∈I |hx, ϕi i|2 ≤ Bkxk22 for all x ∈ H. A tight frame allows A = B. If
A = B = 1 can be chosen, (ϕi )i∈I forms a Parseval frame. For further information, we refer to [12].
(a) (b)
Fig. 1 (a) Mathematics building of TU Berlin (Photo by TU-Pressestelle); (b) Wavelet decomposition
Depending on the signal, a variety of representation systems which can be used to provide
sparse approximations is available and is constantly expanded. In fact, it was recently shown
that wavelet systems do not provide optimally sparse approximations in a regularity setting
which appears to be suitable for most natural images, but the novel system of shearlets does
[46, 47]. Hence, assuming some prior knowledge of the signal to be sensed or compressed,
typically suitable, well-analyzed representation systems are already at hand. If this is not the
case, more data sensitive methods such as dictionary learning algorithms (see, for instance,
[2]), in which a suitable representation system is computed for a given set of test signals, are
available.
Depending on the application at hand, often x is already sparse itself. Think, for instance,
of digital communication, when a cell phone network with n antennas and m users needs to
be modelled. Or consider genomics, when in a test study m genes shall be analyzed with n
patients taking part in the study. In the first scenario, very few of the users have an ongoing
call at a specific time; in the second scenario, very few of the genes are actually active. Thus,
x being sparse itself is also a very natural assumption.
In the compressed sensing literature, most results indeed assume that x itself is sparse, and
the problem y = Ax is considered. Very few articles study the problem of incorporating a
sparsifying orthonormal basis or frame; we mention specifically [9, 61]. In this paper, we
will also assume throughout that x is already a sparse vector. It should be emphasized that
‘exact’ sparsity is often too restricting or unnatural, and weakened sparsity notions need to be
taken into account. On the other hand, sometimes – such as with the tree structure of wavelet
coefficients – some structural information on the non-zero coefficients is known, which leads
to diverse structured sparsity models. Section 2 provides an overview of such models.
Copyright line will be provided by the publisher
4 G. Kutyniok: Compressed Sensing
Due to the unavoidable combinatorial search, this algorithm is however NP-hard [53]. The
main idea of Chen, Donoho, and Saunders in the fundamental paper [14] was to substitute
the ℓ0 ‘norm’ by the closest convex norm, which is the ℓ1 norm. This leads to the following
minimization problem, which they coined Basis Pursuit:
Due to the shape of the ℓ1 ball, ℓ1 minimization indeed promotes sparsity. For an illustration
of this fact, we refer the reader to Figure 2, in which ℓ1 minimization is compared to ℓ2 mini-
mization. We would also like to draw the reader’s attention to the small numerical example in
Figure 3, in which a partial Fourier matrix is chosen as measurement matrix.
min kxk1 s.t. y = Ax
{x : y = Ax}
The general question of when ‘ℓ0 = ℓ1 ’ holds is key to compressed sensing. Both necessary
and sufficient conditions have been provided, which not only depend on the sparsity of the
original vector x, but also on the incoherence of the sensing matrix A, which will be made
precise in Section 3.
Since for very large data sets ℓ1 minimization is often not feasible even when the solvers
are adapted to the particular structure of compressed sensing problems, various other types of
recovery algorithms were suggested. These can be roughly separated into convex optimiza-
tion, greedy, and combinatorial algorithms (cf. Section 5), each one having its own advantages
and disadvantages.
0.8
0.008
0.6
0.006
0.004 0.4
0.002 0.2
0 0
−0.002 −0.2
−0.004 −0.4
−0.006 −0.6
−0.008
−0.8
−0.01
50 100 150 200 250 300 350 400 450 500 50 100 150 200 250 300 350 400 450 500
(a) (b)
approx x0 using l1 approx x0 using l2
0.2
0.8
0.15
0.6
0.1
0.4
0.05
0.2
0
0
−0.2 −0.05
−0.4 −0.1
−0.6 −0.15
−0.8
−0.2
50 100 150 200 250 300 350 400 450 500 50 100 150 200 250 300 350 400 450 500
(c) (d)
Fig. 3 (a) Original signal f with random sample points (indicated by circles); (b) The Fourier transform
fˆ; (c) Perfect recovery of fˆ by ℓ1 minimization; (d) Recovery of fˆ by ℓ2 minimization
It is still an open question (cf. Section 4 for more details) whether deterministic matrices
can be carefully constructed to have similar properties with respect to compressed sensing
problems. At the moment, different approaches towards this problem are being taken such
as structured random matrices by, for instance, Rauhut et al. in [58] or [60]. Moreover, most
applications do not allow for a free choice of the sensing matrix and enforce a particularly
structured matrix. Exemplary situations are the application of data separation, in which the
sensing matrix has to consist of two or more orthonormal bases or frames [32, Chapter 11],
or high resolution radar, for which the sensing matrix has to bear a particular time-frequency
structure [38].
pose intriguing challenges to the area due to the constraints they require, which in turn initi-
ates novel theoretical problems. Finally, we observe that due to the need of, in particular, fast
sparse recovery algorithms, there is a trend to more closely cooperate with mathematicians
from other research areas, for example from optimization theory, numerical linear algebra, or
random matrix theory.
As three examples of recently initiated research directions, we would like to mention the
following. First, while the theory of compressed sensing focusses on digital data, it is desirable
to develop a similar theory for the continuum setting. Two promising approaches were so far
suggested by Eldar et al. (cf. [52]) and Adcock et al. (cf. [1]). Second, in contrast to Basis
Pursuit, which minimizes the ℓ1 norm of the synthesis coefficients, several approaches such
as recovery of missing data minimize the ℓ1 norm of the analysis coefficients – as opposed to
minimizing the ℓ1 norm of the synthesis coefficients –, see Subsections 6.1.2 and 6.2.2. The
relation between these two minimization problems is far from being clear, and the recently
introduced notion of co-sparsity [54] is an interesting approach to shed light onto this problem.
Third, the utilization of frames as a sparsifying system in the context of compressed sensing
has become a topic of increased interest, and we refer to the initial paper [9].
The reader might also want to consult the extensive webpage dsp.rice.edu/cs con-
taining most published papers in the area of compressed sensing subdivided into different
topics. We would also like to draw the reader’s attention to the recent books [29] and [32] as
well as the survey article [7].
1.6 Outline
In Section 2, we start by discussing different sparsity models including structured sparsity
and sparsifying dictionaries. The next section, Section 3, is concerned with presenting both
necessary and sufficient conditions for exact recovery using ℓ1 minimization as a recovery
strategy. The delicateness of designing sensing matrices is the focus of Section 4. In Section
5, other algorithmic approaches to sparse recovery are presented. Finally, applications such
as data separation are discussed in Section 6.
2 Signal Models
Sparsity is the prior information assumed of the vector we intend to efficiently sense or whose
dimension we intend to reduce, depending on which viewpoint we take. We will start by
recalling some classical notions of sparsity. Since applications typically impose a certain
structure on the significant coefficients, various structured sparsity models were introduced
which we will subsequently present. Finally, we will discuss how to ensure sparsity through
an appropriate orthonormal basis or frame.
2.1 Sparsity
The most basic notion of sparsity states that a vector has at most k non-zero coefficients. This
is measured by the ℓ0 ‘norm’, which for simplicity we will throughout refer to as a norm
although it is well-known that k · k0 does not constitute a mathematical norm.
Definition 2.1 A vector x = (xi )ni=1 ∈ Rn is called k-sparse, if
kxk0 = #{i : xi 6= 0} ≤ k.
Copyright line will be provided by the publisher
gamm header will be provided by the publisher 7
k1Λc xk1 ≤ δ.
The notion of k-sparsity can also be regarded from a more general viewpoint, which simul-
taneously imposes additional structure. Let x ∈ Rn be a k-sparse signal. Then it belongs to
the linear subspace consisting of all vectors with the same support set. Hence the set Σk is the
union of all subspaces of vectors with support Λ satisfying |Λ| ≤ k. Thus, a natural extension
of this concept is the following definition, initially introduced in [49].
Definition 2.4 A vector x ∈ Rn is said to belong to a union of subspaces, if there exists a
family of subspaces (Wj )N n
j=1 in R such that
N
[
x∈ Wj .
j=1
At about the same time, the notion of fusion frame sparsity was introduced in [6]. Fusion
frames are a set of subspaces having frame-like properties, thereby allowing for stability con-
siderations. A family of subspaces (Wj )N n
j=1 in R is a fusion frame with bounds A and B,
if
N
X
Akxk22 ≤ kPWj (x)k22 ≤ Bkxk22 for all x ∈ Rn ,
j=1
where PWj denotes the orthogonal projection onto the subspace Wj , see also [13] and [12,
Chapter 13]. Fusion frame theory extends classical frame theory by allowing the analy-
sis of signals through projections onto arbitrary dimensional subspaces as opposed to one-
dimensional subspaces in frame theory, hence serving also as a model for distributed process-
ing, cf. [62]. The notion of fusion frame sparsity then provides a more structured approach
than mere membership in a union of subspaces.
Applications such as manifold learning assume that the signal under consideration lives on
a general manifold, thereby forcing us to leave the world of linear subspaces. In such cases,
the signal class is often modeled as a non-linear k-dimensional manifold M in Rn , i.e.,
x ∈ M = {f (θ) : θ ∈ Θ}
with Θ being a k-dimensional parameter space. Such signals are then considered k-sparse in
the manifold model, see [65]. For a survey chapter about this topic, the interested reader is
referred to [32, Chapter 7].
We wish to finally mention that applications such as matrix completion require generaliza-
tions of vector sparsity by considering, for instance, low-rank matrix models. This is however
beyond the scope of this survey paper, and we refer to [32] for more details.
(i) If a solution x of (P0 ) satisfies kxk0 ≤ k, then this is the unique solution.
Proof. (i) ⇒ (ii). We argue by contradiction. If (ii) does not hold, by Lemma 3.2, there exists
some h ∈ N (A), h 6= 0 such that khk0 ≤ 2k. Thus, there exist x and x̃ satisfying h = x − x̃
and kxk0 , kx̃k0 ≤ k, but Ax = Ax̃, a contradiction to (i).
(ii) ⇒ (i). Let x and x̃ satisfy y = Ax = Ax̃ and kxk0 , kx̃k0 ≤ k. Thus x − x̃ ∈ N (A)
and kx − x̃k0 ≤ 2k < spark(A). By Lemma 3.2, it follows that x − x̃ = 0, which implies
(i).
Copyright line will be provided by the publisher
10 G. Kutyniok: Compressed Sensing
An equivalent condition for the existence of a unique sparse solution of (P1 ) can now be
stated in terms of the null space property. For the proof, we refer to [15].
Theorem 3.5 ( [15]) Let A be an m × n matrix, and let k ∈ N. Then the following
conditions are equivalent.
(i) If a solution x of (P1 ) satisfies kxk0 ≤ k, then this is the unique solution.
It should be emphasized that [15] studies the Compressed Sensing Problem in a much more
general way by analyzing quite general encoding-decoding strategies.
The maximal mutual coherence of a matrix certainly equals 1 in the case when two columns
are linearly dependent. The lower bound presented in the next result, also known as the
Welch bound, is more interesting. It can be shown that it is attained by so-called optimal
Grassmannian frames [63], see also Section 4.
Lemma 3.7 Let A be an m × n matrix. Then we have
hr n − m i
µ(A) ∈ ,1 .
m(n − 1)
Let us mention that different variants of mutual coherence exist, in particular, the Babel
function [19], the cumulative coherence function [64], the structured p-Babel function [4], the
fusion coherence [6], and cluster coherence [22]. The notion of cluster coherence will in fact
be later discussed in Section 6 for a particular application.
Imposing a bound on the sparsity of the original vector by the mutual coherence of the
sensing matrix, the following result can be shown; its proof can be found in [19].
Theorem 3.8 ( [19, 30]) Let A be an m × n matrix, and let x ∈ Rn \ {0} be a solution of
(P0 ) satisfying
kxk0 < 12 (1 + µ(A)−1 ).
Then x is the unique solution of (P0 ) and (P1 ).
Theorem 3.11 ( [25, 26]) Let C n be defined as in (1), let A be an m × n matrix, and let
the polytope P be defined by P = AC n ⊆ Rm . Then the following conditions are equivalent.
(i) The number of k-faces of P equals the number of k-faces of C n .
(ii) (P0 ) = (P1 ).
The geometric intuition behind this result is the fact that the number of k-faces of P equals
the number of indexing sets Λ ⊆ {1, . . . , n} with |Λ| = k such that vectors x satisfying
supp x = Λ can be recovered via (P1 ).
Extending these techniques, Donoho and Tanner were also able to provide highly accurate
analytical descriptions of the occurring phase transition when considering the area of exact
recovery dependent on the ratio of the number of equations to the number of unknowns n/m
versus the ratio of the number of nonzeros to the number of equations k/n. The interested
reader is referred to [27] for further details.
4 Sensing Matrices
Ideally, we aim for a matrix which has high spark, low mutual coherence, and a small RIP
constant. As our discussion in this section will show, these properties are often quite difficult
to achieve, and even computing, for instance, the RIP constant is computationally intractable
in general (see [59]).
In the sequel, after presenting some general relations between the introduced notions of
spark, NSP, mutual coherence, and RIP, we will discuss some explicit constructions for, in
particular, mutual coherence and RIP.
(ii) A satisfies the RIP of order k with δk = kµ(A) for all k < µ(A)−1 .
√
(iii) Suppose A satisfies the RIP of order 2k with δ2k < 2 − 1. If
√ r
2δ2k k
√ < ,
1 − (1 + 2)δ2k n
4.3 RIP
We begin by discussing some deterministic constructions of matrices satisfying the RIP. The
first noteworthy construction was presented by DeVore and requires m & k 2 , see [17]. A very
recent, highly sophisticated approach [5] by Bourgain et al. still requires m & k 2−α with
some small constant α. Hence up to now deterministic constructions require a large m, which
is typically not feasible for applications, since it scales quadratically in k.
The construction of random sensing matrices satisfying RIP is a possibility to circumvent
this problem. Such constructions are closely linked to the famous Johnson-Lindenstrauss
Lemma, which is extensively utilized in numerical linear algebra, machine learning, and other
areas requiring dimension reduction.
Theorem 4.2 (Johnson-Lindenstrauss Lemma [41]) Let ε ∈ (0, 1), let x1 , . . . , xp ∈ Rn ,
and let m = O(ε−2 log p) be a positive integer. Then there exists a Lipschitz map f : Rn →
Rm such that
(1 − ε)kxi − xj k22 ≤ kf (xi )− f (xj )k22 ≤ (1 + ε)kxi − xj k22 for all i, j ∈ {1, . . . , p}.
The key requirement for a matrix to satisfy the Johnson-Lindenstrauss Lemma with high
probability is the following concentration inequality for an arbitrarily fixed x ∈ Rn :
2
P (1 − ε)kxk22 ≤ kAxk22 ≤ (1 + ε)kxk22 ≤ 1 − 2e−c0 ε m , (2)
with the entries of A being generated by a certain probability distribution. The relation of RIP
to the Johnson-Lindenstrauss Lemma is established in the following result. We also mention
that recently even a converse of the following theorem was proved in [43].
Theorem 4.3 ( [3]) Let δ ∈ (0, 1). If the probability distribution generating the m × n
matrices A satisfies the concentration inequality (2) with ε = δ, then there exist constants
2
c1 , c2 such that, with probability ≤ 1 − 2e−c2 δ m , A satisfies the RIP of order k with δ for all
2
k ≤ c1 δ m/ log(n/k).
This observation was then used in [3] to prove that Gaussian and Bernoulli random matrices
satisfy the RIP of order k with δ provided that m & δ −2 k log(n/k). Up to a constant, lower
bounds for Gelfand widths of ℓ1 -balls [35] show that this dependence on k and n is indeed
optimal.
5 Recovery Algorithms
In this section, we will provide a brief overview of the different types of algorithms typically
used for sparse recovery. Convex optimization algorithms require very few measurements but
are computationally more complex. On the other extreme are combinatorial algorithms, which
are very fast – often sublinear – but require many measurements that are sometimes difficult
to obtain. Greedy algorithms are in some sense a good compromise between those extremes
concerning computational complexity and the required number of measurements.
most commonly used. If the measurements are affected by noise, a conic constraint is re-
quired; i.e., the minimization problem needs to be changed to
for a carefully chosen ε > 0. For a particular regularization parameter λ > 0, this problem is
equivalent to the unconstrained version given by
Input:
• Matrix A = (ai )ni=1 ∈ Rm×n and vector x ∈ Rn .
• Error threshold ε.
Algorithm:
1) Set k = 0.
2) Set the initial solution x0 = 0.
3) Set the initial residual r0 = y − Ax0 = y.
4) Set the initial support S 0 = supp x0 = ∅.
5) Repeat
6) Set k = k + 1.
7) Choose i0 such that minc kcai0 − rk−1 k2 ≤ minc kcai − rk−1 k2 for all i.
8) Set S k = S k−1 ∪ {i0 }.
9) Compute xk = argminx kAx − yk2 subject to supp x = S k .
10) Compute rk = y − Axk .
11) until krk k2 < ε.
Output:
• Approximate solution xk .
recent development is message passing algorithms for compressed sensing pioneered in [23];
a survey on those can be found in [32, Chapter 9].
6 Applications
We now turn to some applications of compressed sensing. Two of those we will discuss in
more detail, namely data separation and recovery of missing data.
ΦT2 x2 = x2 = c2
is also sparse. Since the mutual coherence of the matrix [ Φ1 | Φ2 ] can be computed to be √1 ,
n
Theorem 3.8 implies the following result.
Theorem 6.1 ( [21, 30]) Let x1 , x2 and Φ1 , Φ√ 2 be defined as in the previous paragraph,
and assume that kΦT1 x1 k0 + kΦT2 x2 k0 < 12 (1 + n). Then
ΦT1 x1
c1 c1
= argminc1 ,c2 subject to x = [ Φ1 | Φ2 ] .
ΦT2 x2 c2 1
c2
x = x1 + x2 = Φ1 (ΦT1 x1 ) + Φ2 (ΦT2 x2 ).
This particular choice of coefficients – which are in frame theory language termed analysis
coefficients – leads to the minimization problem
Interestingly, the associated recovery results employ structured sparsity, wherefore we will
also briefly present those. First, the notion of relative sparsity (cf. Definition 2.3) is adapted
to this situation.
Definition 6.2 Let Φ1 and Φ2 be Parseval frames for Rn with indexing sets {1, . . . , N1 }
and {1, . . . , N2 }, respectively, let Λi ⊂ {1, . . . , Ni }, i = 1, 2, and let δ > 0. Then the vectors
x1 and x2 are called δ-relatively sparse in Φ1 and Φ2 with respect to Λ1 and Λ2 , if
Second, the notion of mutual coherence is adapted to structured sparsity as already dis-
cussed in Subsection 3.2.1. This leads to the following definition of cluster coherence.
Definition 6.3 Let Φ1 = (ϕ1i )N N2 n
i=1 and Φ2 = (ϕ2j )j=1 be Parseval frames for R , respec-
1
tively, and let Λ1 ⊂ {1, . . . , N1 }. Then the cluster coherence µc (Λ1 , Φ1 ; Φ2 ) of Φ1 and Φ2
with respect to Λ1 is defined by
X
µc (Λ1 , Φ1 ; Φ2 ) = max |hϕ1i , ϕ2j i|.
j=1,...,N2
i∈Λ1
The performance of the minimization problem (4) can then be analyzed as follows. It
should be emphasized that the clusters of significant coefficients Λ1 and Λ2 are a mere analysis
tool; the algorithm does not take those into account. Further, notice that the choice of those
sets is highly delicate in its impact on the separation estimate. For the proof of the result, we
refer to [22].
Theorem 6.4 ( [22]) Let x = x1 + x2 ∈ Rn , let Φ1 and Φ2 be Parseval frames for Rn with
indexing sets {1, . . . , N1 } and {1, . . . , N2 }, respectively, and let Λi ⊂ {1, . . . , Ni }, i = 1, 2.
Further, suppose that x1 and x2 are δ-relatively sparse in Φ1 and Φ2 with respect to Λ1 and
Λ2 , and let [x⋆1 , x⋆2 ]T be a solution of the minimization problem (4). Then
2δ
kx⋆1 − x1 k2 + kx⋆2 − x2 k2 ≤ ,
1 − 2µc
where µc = max{µc (Λ1 , Φ1 ; Φ2 ), µc (Λ2 , Φ2 ; Φ1 )}.
Let us finally mention that data separation via compressed sensing has been applied, for
instance, in imaging sciences for the separation of point- and curvelike objects, a problem ap-
pearing in several areas such as in astronomical imaging when separating stars from filaments
and in neurobiological imaging when separating spines from dendrites. Figure 5 illustrates a
numerical result from [48] using wavelets (see [50]) and shearlets (see [46,47]) as sparsifying
frames. A theoretical foundation for separation of point- and curvelike objects by ℓ1 mini-
mization is developed in [22]. When considering thresholding as separation method for such
features, even stronger theoretical results could be proven in [45]. Moreover, a first analysis of
separation of cartoon and texture – very commonly present in natural images – was performed
in [44].
For more details on data separation using compressed sensing techniques, we refer to [32,
Chapter 11].
The original vector x can then be recovered via x = Φc. The solution of the inpainting
problem – a terminology used for recovery of missing data in imaging science – was first
considered in [31].
Application of Theorem 3.8 provides a sufficient condition for missing data recovery to
succeed.
Theorem 6.5 ( [19]) Let x ∈ Rn , let W be a subspace of Rn , and let Φ be an orthonormal
basis for Rn . If kΦT xk0 < 21 (1 + µ(PW Φ)−1 ), then
Employing relative sparsity and cluster coherence, an error analysis can be derived in a
similar way as before. For the proof, the reader might want to consult [42].
Copyright line will be provided by the publisher
20 G. Kutyniok: Compressed Sensing
Theorem 6.6 ( [42]) Let x ∈ Rn , let Φ be a Parseval frame for Rn with indexing set
{1, . . . , N }, and let Λ ⊂ {1, . . . , N }. Further, suppose that x is δ-relatively sparse in Φ with
respect to Λ, and let x⋆ be a solution of the minimization problem (6). Then
2δ
kx⋆ − xk2 ≤ ,
1 − 2µc
Acknowledgements The author is grateful to the reviewers for many helpful suggestions which im-
proved the presentation of the paper. She would also like to thank Emmanuel Candès, David Donoho,
Michael Elad, and Yonina Eldar for various discussions on related topics, and Sadegh Jokar for pro-
ducing Figure 3. The author acknowledges support by the Einstein Foundation Berlin, by Deutsche
Forschungsgemeinschaft (DFG) Grants SPP-1324 KU 1446/13 and KU 1446/14, and by the DFG Re-
search Center M ATHEON “Mathematics for key technologies” in Berlin.
References
[1] B. Adcock and A. C. Hansen. Generalized sampling and infinite dimensional compressed sensing.
Preprint, 2012.
[2] M. Aharon, M. Elad, and A. M. Bruckstein. The K-SVD: An algorithm for designing of overcom-
plete dictionaries for sparse representation. IEEE Trans. Signal Proc., 54:4311–4322, 2006.
[3] R. G. Baraniuk, M. Davenport, R. A. DeVore, and M. Wakin. A simple proof of the Restricted
Isometry Property for random matrices. Constr. Approx., 28:253-263, 2008.
[4] L. Borup, R. Gribonval, and M. Nielsen. Beyond coherence: Recovering structured time-frequency
representations. Appl. Comput. Harmon. Anal., 14:120–128, 2008.
[5] J. Bourgain, S. Dilworth, K. Ford, S. Konyagin, and D. Kutzarova. Explicit constructions of rip
matrices and related problems. Duke Math. J., 159:145–185, 2011.
[6] B. Boufounos, G. Kutyniok, and H. Rauhut. Sparse recovery from combined fusion frame mea-
surements. IEEE Trans. Inform. Theory, 57:3864–387, 2011.
[7] A. M. Bruckstein, D. L. Donoho, and A. Elad. From sparse solutions of systems of equations to
sparse modeling of signals and images. SIAM Rev., 51:34–81, 2009.
[8] E. J. Candès. The restricted isometry property and its implications for compressed sensing. C. R.
Acad. Sci. I, 346:589–592, 2008.
[9] E. J. Candès, Y. C. Eldar, D. Needell, and P. Randall. Compressed Sensing with Coherent and
Redundant Dictionaries. Appl. Comput. Harmon. Anal., 31:59–73, 2011.
[10] E. J. Candès and B. Recht. Exact matrix completion via convex optimization. Found. of Comput.
Math., 9:717–772, 2008.
[11] E. Candès, J. Romberg, and T. Tao. Robust uncertainty principles: Exact signal reconstruction
from highly incomplete Fourier information. IEEE Trans. Inform. Theory, 52:489-509, 2006.
[12] P. G. Casazza and G. Kutyniok. Finite Frames: Theory and Applications, Birkhäuser, Boston,
2012.
[13] P. G. Casazza, G. Kutyniok, and S. Li. Fusion Frames and Distributed Processing. Appl. Comput.
Harmon. Anal. 25:114–132, 2008.
[14] S. S. Chen, D. L. Donoho, and M. A. Saunders. Atomic decomposition by basis pursuit. SIAM J.
Sci. Comput., 20:33–61, 1998.
[15] A. Cohen, W. Dahmen, and R. DeVore. Compressed sensing and best k-term approximation. J.
Am. Math. Soc., 22:211–231, 2009.
[16] I. Daubechies, M. Defrise, and C. De Mol. An iterative thresholding algorithm for linear inverse
problems with a sparsity constraint. Comm. Pure Appl. Math., 57:1413-1457, 2004.
[17] R. DeVore. Deterministic constructions of compressed sensing matrices. J. Complexity, 23:918–
925, 2007.
[18] D. L. Donoho. Compressed sensing. IEEE Trans. Inform. Theory, 52:1289–1306, 2006.
[19] D. L. Donoho and M. Elad. Optimally sparse representation in general (nonorthogonal) dictionar-
ies via l1 minimization, Proc. Natl. Acad. Sci. USA, 100:2197–2202, 2003.
[20] D. L. Donoho, M. Elad, and V. Temlyakov. Stable recovery of sparse overcomplete representations
in the presence of noise. IEEE Trans. Inform. Theory, 52:6–18, 2006.
[21] D. L. Donoho and X. Huo. Uncertainty principles and ideal atomic decomposition. IEEE Trans.
Inform. Theory, 47:2845–2862, 2001.
[22] D. L. Donoho and G. Kutyniok. Microlocal analysis of the geometric separation problem. Comm.
Pure Appl. Math., 66:1–47, 2013.
[23] D. L. Donoho, A. Maleki, and A. Montanari. Message passing algorithms for compressed sensing.
Proc. Natl. Acad. Sci. USA, 106:18914–18919, 2009.
[24] D. L. Donoho and P. B. Starck. Uncertainty principles and signal recovery. SIAM J. Appl. Math.,
49:906–931, 1989.
[25] D. L. Donoho and J. Tanner. Neighborliness of Randomly-Projected Simplices in High Dimen-
sions. Proc. Natl. Acad. Sci. USA, 102:9452–9457, 2005.
[26] D. L. Donoho and J. Tanner. Sparse Nonnegative Solutions of Underdetermined Linear Equations
by Linear Programming. Proc. Natl. Acad. Sci. USA, 102:9446–9451, 2005.
[27] D. L. Donoho and J. Tanner. Observed universality of phase transitions in high-dimensional ge-
ometry, with implications for modern data analysis and signal processing. Philos. Trans. Roy. Soc.
S.-A, 367:4273–4293, 2009.
[28] D. L. Donoho, Y. Tsaig, I. Drori, and J.-L. Starck. Sparse Solution of Underdetermined Linear
Equations by Stagewise Orthogonal Matching Pursuit. Preprint, 2007.
[29] M. Elad. Sparse and Redundant Representations. Springer, New York, 2010.
[30] M. Elad and A. M. Bruckstein. A generalized uncertainty principle and sparse representation in
pairs of bases. IEEE Trans. Inform. Theory, 48:2558–2567, 2002.
[31] M. Elad, J.-L. Starck, P. Querre, and D. L. Donoho. Simultaneous cartoon and texture image in-
painting using morphological component analysis (MCA). Appl. Comput. Harmon. Anal., 19:340–
358, 2005.
[32] Y. C. Eldar and G. Kutyniok. Compressed Sensing: Theory and Applications. Cambridge Univer-
sity Press, 2012.
[33] M. A. T. Figueiredo, R. D. Nowak, and S. J. Wright. Gradient projection for sparse reconstruction:
Application to compressed sensing and other inverse problems. IEEE J. Sel. Top. Signa., 1:586–
597, 2007.
[34] S. Foucart. A note on guaranteed sparse recovery via .1-minimization. Appl. Comput. Harmon.
Anal., 29:97–103, 2010.
[35] S. Foucart, A. Pajor, H. Rauhut, and T. Ullrich. The Gelfand widths of ℓp -balls for 0 < p ≤ 1. J.
Complexity, 26:629–640, 2010.
[36] A. C. Gilbert, M. J. Strauss, and R. Vershynin. One sketch for all: Fast algorithms for Compressed
Sensing. In Proc. 39th ACM Symp. Theory of Computing (STOC), San Diego, CA, 2007.
[37] B. Grünbaum. Convex polytopes. Graduate Texts in Mathematics 221, Springer-Verlag, New York,
2003.
[38] M. Herman and T. Strohmer. High Resolution Radar via Compressed Sensing. IEEE Trans. Signal
Proc., 57:2275–2284, 2009.
[39] M. A. Iwen. Combinatorial Sublinear-Time Fourier Algorithms. Found. of Comput. Math.,
10:303–338, 2010.
[40] P. Jain, A. Tewari, and I. S. Dhillon. Orthogonal Matching Pursuit with Replacement. In Proc.
Neural Inform. Process. Systems Conf. (NIPS), 2011.
[41] W. B. Johnson and J. Lindenstrauss. Extensions of Lipschitz mappings into a Hilbert space. Con-
temp. Math, 26:189-206, 1984.
[42] E. King, G. Kutyniok, and X. Zhuang. Analysis of Inpainting via Clustered Sparsity and Microlo-
cal Analysis. J. Math. Imaging Vis., to appear.
[43] F. Krahmer and R. Ward. New and improved Johnson-Lindenstrauss embeddings via the Restricted
Isometry Property. SIAM J. Math. Anal., 43:1269–1281, 2011.
[44] G. Kutyniok. Clustered Sparsity and Separation of Cartoon and Texture. SIAM J. Imaging Sci. 6
(2013), 848-874.
[45] G. Kutyniok. Geometric Separation by Single-Pass Alternating Thresholding. Appl. Comput. Har-
mon. Anal., to appear.
[46] G. Kutyniok and D. Labate. Shearlets: Multiscale Analysis for Multivariate Data. Birkhäuser,
Boston, 2012.
[47] G. Kutyniok and W.-Q Lim. Compactly supported shearlets are optimally sparse. J. Approx. The-
ory, 163:1564–1589, 2011.
[48] G. Kutyniok and W.-Q Lim. Image separation using shearlets. In Curves and Surfaces (Avignon,
France, 2010), Lecture Notes in Computer Science 6920, Springer, 2012.
[49] Y. Lu and M. Do. Sampling signals from a union of subspaces. IEEE Signal Proc. Mag., 25:41–47,
2008.
[50] S. G. Mallat. A wavelet tour of signal processing: The sparse way. Academic Press, Inc., San
Diego, CA, 1998.
[51] S. G. Mallat and Z. Zhang. Matching pursuits with time-frequency dictionaries. IEEE Trans. Signal
Proc., 41:3397–3415, 1993.
[52] M. Mishali, Y. C. Eldar, and A. Elron. Xampling: Signal Acquisition and Processing in Union of
Subspaces. IEEE Trans. Signal Proc., 59:4719-4734, 2011.
[53] S. Muthukrishnan. Data Streams: Algorithms and Applications. Now Publishers, Boston, MA,
2005.
[54] S. Nam, M. E. Davies, M. Elad, and R. Gribonval. The Cosparse Analysis Model and Algorithms.
Appl. Comput. Harmon. Anal., 34:30–56, 2013.
[55] D. Needell and J. A. Tropp. CoSaMP: Iterative signal recovery from incomplete and inaccurate
samples. Appl. Comput. Harmon. Anal., 26:301–321, 2008.
[56] D. Needell and R. Vershynin. Uniform Uncertainty Principle and signal recovery via Regularized
Orthogonal Matching Pursuit. Found. of Comput. Math., 9:317–334, 2009.
[57] Y. C. Pati, R. Rezaiifar, and P. S. Krishnaprasad. Orthogonal matching pursuit: Recursive function
approximation with applications to wavelet decomposition. In Proc. of the 27th Asilomar Confer-
ence on Signals, Systems and Computers, 1:40-44, 1993.
[58] G. Pfander, H. Rauhut, and J. Tropp. The restricted isometry property for time-frequency struc-
tured random matrices. Prob. Theory Rel. Fields, to appear.
[59] M. Pfetsch and A. Tillmann. The Computational Complexity of the Restricted Isometry Property,
the Nullspace Property, and Related Concepts in Compressed Sensing. Preprint, 2012.
[60] H. Rauhut, J. Romberg, and J. Tropp. Restricted isometries for partial random circulant matrices.
Appl. Comput. Harmon. Anal., 32:242–254, 2012.
[61] H. Rauhut, K. Schnass, and P. Vandergheynst. Compressed sensing and redundant dictionaries.
IEEE Trans. Inform. Theory, 54:2210–2219, 2008.
[62] C. J. Rozell and D. H. Johnson. Analysis of noise reduction in redundant expansions under dis-
tributed processing requirements. In Proceedings of the International Conference on Acoustics,
Speech, and Signal Processing (ICASSP), 185–188, Philadelphia, PA, 2005.
[63] T. Strohmer and R. W. Heath. Grassmannian frames with applications to coding and communica-
tion. Appl. Comput. Harmon. Anal., 14:257-275, 2004.
[64] J. A. Tropp. Greed is good: Algorithmic results for sparse approximation. IEEE Trans. Inform.
Theory, 50:2231–2242, 2004.
[65] X. Weiyu and B. Hassibi. Compressive Sensing over the Grassmann Manifold: a Unified Ana-
lytical Framework. In 46th Annual Allerton Conf. on Communication, Control, and Computing,
2008.