0% found this document useful (0 votes)
9 views23 pages

Survey Compressed Sensing

The document provides an introduction to compressed sensing, a field that has gained significant traction since its inception in 2006, focusing on recovering sparse signals from incomplete measurements using efficient algorithms. It discusses the mathematical formulation of the compressed sensing problem, the importance of sparsity in signal representation, and various recovery algorithms, particularly emphasizing ℓ1 minimization. The paper also outlines current research directions and applications of compressed sensing across multiple disciplines, including engineering and applied mathematics.

Uploaded by

Vimala Elumalai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views23 pages

Survey Compressed Sensing

The document provides an introduction to compressed sensing, a field that has gained significant traction since its inception in 2006, focusing on recovering sparse signals from incomplete measurements using efficient algorithms. It discusses the mathematical formulation of the compressed sensing problem, the importance of sparsity in signal representation, and various recovery algorithms, particularly emphasizing ℓ1 minimization. The paper also outlines current research directions and applications of compressed sensing across multiple disciplines, including engineering and applied mathematics.

Uploaded by

Vimala Elumalai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

GAMM-Mitteilungen, 10 July 2013

Theory and Applications of Compressed Sensing


Gitta Kutyniok ∗
Technische Universität Berlin
Institut für Mathematik
Straße des 17. Juni 136
10623 Berlin
Germany

Received XXXX, revised XXXX, accepted XXXX


Published online XXXX

Key words Dimension reduction, frames, greedy algorithms, ill-posed inverse problems, ℓ1
minimization, random matrices, sparse approximation, sparse recovery
MSC (2000) 94A12, 65F22, 94A20, 68U10, 90C25, 15B52
Compressed sensing is a novel research area, which was introduced in 2006, and since then
has already become a key concept in various areas of applied mathematics, computer science,
and electrical engineering. It surprisingly predicts that high-dimensional signals, which allow
a sparse representation by a suitable basis or, more generally, a frame, can be recovered from
what was previously considered highly incomplete linear measurements by using efficient
algorithms. This article shall serve as an introduction to and a survey about compressed
sensing.

Copyright line will be provided by the publisher

1 Introduction
The area of compressed sensing was initiated in 2006 by two groundbreaking papers, namely
[18] by Donoho and [11] by Candès, Romberg, and Tao. Nowadays, after only 6 years,
an abundance of theoretical aspects of compressed sensing are explored in more than 1000
articles. Moreover, this methodology is to date extensively utilized by applied mathemati-
cians, computer scientists, and engineers for a variety of applications in astronomy, biology,
medicine, radar, and seismology, to name a few.
The key idea of compressed sensing is to recover a sparse signal from very few non-
adaptive, linear measurements by convex optimization. Taking a different viewpoint, it con-
cerns the exact recovery of a high-dimensional sparse vector after a dimension reduction step.
From a yet another standpoint, we can regard the problem as computing a sparse coefficient
vector for a signal with respect to an overcomplete system. The theoretical foundation of
compressed sensing has links with and also explores methodologies from various other fields
such as, for example, applied harmonic analysis, frame theory, geometric functional analysis,
numerical linear algebra, optimization theory, and random matrix theory.
It is interesting to notice that this development – the problem of sparse recovery – can in
fact be traced back to earlier papers from the 90s such as [24] and later the prominent papers
∗ E-mail: [email protected], Phone: +49 30 314 25758, Fax: +49 30 314 21604

Copyright line will be provided by the publisher


2 G. Kutyniok: Compressed Sensing

by Donoho and Huo [21] and Donoho and Elad [19]. When the previously mentioned two
fundamental papers introducing compressed sensing were published, the term ‘compressed
sensing’ was initially utilized for random sensing matrices, since those allow for a minimal
number of non-adaptive, linear measurements. Nowadays, the terminology ‘compressed sens-
ing’ is more and more often used interchangeably with ‘sparse recovery’ in general, which is
a viewpoint we will also take in this survey paper.

1.1 The Compressed Sensing Problem


To state the problem mathematically precisely, let now x = (xi )ni=1 ∈ Rn be our signal of
interest. As prior information, we either assume that x itself is sparse, i.e., it has very few
non-zero coefficients in the sense that

kxk0 := #{i : xi 6= 0}

is small, or that there exists an orthonormal basis or a frame1 Φ such that x = Φc with c
being sparse. For this, we let Φ be the matrix with the elements of the orthonormal basis
or the frame as column vectors. In fact, a frame typically provides more flexibility than an
orthonormal basis due to its redundancy and hence leads to improved sparsifying properties,
hence in this setting customarily frames are more often employed than orthonormal bases.
Sometimes the notion of sparsity is weakened, which we for now – before we will make this
precise in Section 2 – will refer to as approximately sparse. Further, let A be an m × n matrix,
which is typically called sensing matrix or measurement matrix. Throughout we will always
assume that m < n and that A does not possess any zero columns, even if not explicitly
mentioned.
Then the Compressed Sensing Problem can be formulated as follows: Recover x from
knowledge of

y = Ax,

or recover c from knowledge of

y = AΦc.

In both cases, we face an underdetermined linear system of equations with sparsity as prior
information about the vector to be recovered – we do not however know the support, since
then the solution could be trivially obtained.
This leads us to the following questions:
• What are suitable signal and sparsity models?
• How, when, and with how much accuracy can the signal be algorithmically recovered?
• What are suitable sensing matrices?
In this section, we will discuss these questions briefly to build up intuition for the subsequent
sections.
1 Recall that a frame for a Hilbert space H is a system (ϕ )
P i i∈I in H, for which there exist frame bounds
0 < A ≤ B < ∞ such that Akxk22 ≤ i∈I |hx, ϕi i|2 ≤ Bkxk22 for all x ∈ H. A tight frame allows A = B. If
A = B = 1 can be chosen, (ϕi )i∈I forms a Parseval frame. For further information, we refer to [12].

Copyright line will be provided by the publisher


gamm header will be provided by the publisher 3

1.2 Sparsity: A Reasonable Assumption?


As a first consideration, one might question whether sparsity is indeed a reasonable assump-
tion. Due to the complexity of real data certainly only a heuristic answer is possible.
If a natural image is taken, it is well known that wavelets typically provide sparse approx-
imations. This is illustrated in Figure 1, which shows a wavelet decomposition [50] of an
exemplary image. It can clearly be seen that most coefficients are small in absolute value,
indicated by a darker color.

(a) (b)
Fig. 1 (a) Mathematics building of TU Berlin (Photo by TU-Pressestelle); (b) Wavelet decomposition

Depending on the signal, a variety of representation systems which can be used to provide
sparse approximations is available and is constantly expanded. In fact, it was recently shown
that wavelet systems do not provide optimally sparse approximations in a regularity setting
which appears to be suitable for most natural images, but the novel system of shearlets does
[46, 47]. Hence, assuming some prior knowledge of the signal to be sensed or compressed,
typically suitable, well-analyzed representation systems are already at hand. If this is not the
case, more data sensitive methods such as dictionary learning algorithms (see, for instance,
[2]), in which a suitable representation system is computed for a given set of test signals, are
available.
Depending on the application at hand, often x is already sparse itself. Think, for instance,
of digital communication, when a cell phone network with n antennas and m users needs to
be modelled. Or consider genomics, when in a test study m genes shall be analyzed with n
patients taking part in the study. In the first scenario, very few of the users have an ongoing
call at a specific time; in the second scenario, very few of the genes are actually active. Thus,
x being sparse itself is also a very natural assumption.
In the compressed sensing literature, most results indeed assume that x itself is sparse, and
the problem y = Ax is considered. Very few articles study the problem of incorporating a
sparsifying orthonormal basis or frame; we mention specifically [9, 61]. In this paper, we
will also assume throughout that x is already a sparse vector. It should be emphasized that
‘exact’ sparsity is often too restricting or unnatural, and weakened sparsity notions need to be
taken into account. On the other hand, sometimes – such as with the tree structure of wavelet
coefficients – some structural information on the non-zero coefficients is known, which leads
to diverse structured sparsity models. Section 2 provides an overview of such models.
Copyright line will be provided by the publisher
4 G. Kutyniok: Compressed Sensing

1.3 Recovery Algorithms: Optimization Theory and More


Let x now be a sparse vector. It is quite intuitive to recover x from knowledge of y by solving

(P0 ) min kxk0 subject to y = Ax.


x

Due to the unavoidable combinatorial search, this algorithm is however NP-hard [53]. The
main idea of Chen, Donoho, and Saunders in the fundamental paper [14] was to substitute
the ℓ0 ‘norm’ by the closest convex norm, which is the ℓ1 norm. This leads to the following
minimization problem, which they coined Basis Pursuit:

(P1 ) min kxk1 subject to y = Ax.


x

Due to the shape of the ℓ1 ball, ℓ1 minimization indeed promotes sparsity. For an illustration
of this fact, we refer the reader to Figure 2, in which ℓ1 minimization is compared to ℓ2 mini-
mization. We would also like to draw the reader’s attention to the small numerical example in
Figure 3, in which a partial Fourier matrix is chosen as measurement matrix.
min kxk1 s.t. y = Ax

min kxk2 s.t. y = Ax

{x : y = Ax}

Fig. 2 ℓ1 minimization versus ℓ2 minimization

The general question of when ‘ℓ0 = ℓ1 ’ holds is key to compressed sensing. Both necessary
and sufficient conditions have been provided, which not only depend on the sparsity of the
original vector x, but also on the incoherence of the sensing matrix A, which will be made
precise in Section 3.
Since for very large data sets ℓ1 minimization is often not feasible even when the solvers
are adapted to the particular structure of compressed sensing problems, various other types of
recovery algorithms were suggested. These can be roughly separated into convex optimiza-
tion, greedy, and combinatorial algorithms (cf. Section 5), each one having its own advantages
and disadvantages.

1.4 Sensing Matrices: How Much Freedom is Allowed?


As already mentioned, sensing matrices are required to satisfy certain incoherence conditions
such as, for instance, a small so-called mutual coherence. If we are allowed to choose the
sensing matrix freely, the best choice are random matrices such as Gaussian iid matrices,
uniform random ortho-projectors, or Bernoulli matrices, see for instance [11].
Copyright line will be provided by the publisher
gamm header will be provided by the publisher 5

Measurements y=U*x0 Signal x0


0.01

0.8
0.008

0.6
0.006

0.004 0.4

0.002 0.2

0 0

−0.002 −0.2

−0.004 −0.4

−0.006 −0.6

−0.008
−0.8

−0.01
50 100 150 200 250 300 350 400 450 500 50 100 150 200 250 300 350 400 450 500

(a) (b)
approx x0 using l1 approx x0 using l2
0.2

0.8
0.15
0.6
0.1
0.4

0.05
0.2

0
0

−0.2 −0.05

−0.4 −0.1

−0.6 −0.15

−0.8
−0.2

50 100 150 200 250 300 350 400 450 500 50 100 150 200 250 300 350 400 450 500

(c) (d)
Fig. 3 (a) Original signal f with random sample points (indicated by circles); (b) The Fourier transform
fˆ; (c) Perfect recovery of fˆ by ℓ1 minimization; (d) Recovery of fˆ by ℓ2 minimization

It is still an open question (cf. Section 4 for more details) whether deterministic matrices
can be carefully constructed to have similar properties with respect to compressed sensing
problems. At the moment, different approaches towards this problem are being taken such
as structured random matrices by, for instance, Rauhut et al. in [58] or [60]. Moreover, most
applications do not allow for a free choice of the sensing matrix and enforce a particularly
structured matrix. Exemplary situations are the application of data separation, in which the
sensing matrix has to consist of two or more orthonormal bases or frames [32, Chapter 11],
or high resolution radar, for which the sensing matrix has to bear a particular time-frequency
structure [38].

1.5 Compressed Sensing: Quo Vadis?


At present, a comprehensive core theory seems established except for some few deep ques-
tions such as the construction of deterministic sensing matrices exhibiting properties similar
to random matrices.
One current main direction of research which can be identified with already various ex-
isting results is the incorporation of additional sparsity properties typically coined structured
sparsity, see Section 2 for references. Another main direction is the extension or transfer
of the Compressed Sensing Problem to other settings such as matrix completion, see for in-
stance [10]. Moreover, we are currently witnessing the diffusion of compressed sensing ideas
to various application areas such as radar analysis, medical imaging, distributed signal pro-
cessing, and data quantization, to name a few; see [32] for an overview. These applications
Copyright line will be provided by the publisher
6 G. Kutyniok: Compressed Sensing

pose intriguing challenges to the area due to the constraints they require, which in turn initi-
ates novel theoretical problems. Finally, we observe that due to the need of, in particular, fast
sparse recovery algorithms, there is a trend to more closely cooperate with mathematicians
from other research areas, for example from optimization theory, numerical linear algebra, or
random matrix theory.
As three examples of recently initiated research directions, we would like to mention the
following. First, while the theory of compressed sensing focusses on digital data, it is desirable
to develop a similar theory for the continuum setting. Two promising approaches were so far
suggested by Eldar et al. (cf. [52]) and Adcock et al. (cf. [1]). Second, in contrast to Basis
Pursuit, which minimizes the ℓ1 norm of the synthesis coefficients, several approaches such
as recovery of missing data minimize the ℓ1 norm of the analysis coefficients – as opposed to
minimizing the ℓ1 norm of the synthesis coefficients –, see Subsections 6.1.2 and 6.2.2. The
relation between these two minimization problems is far from being clear, and the recently
introduced notion of co-sparsity [54] is an interesting approach to shed light onto this problem.
Third, the utilization of frames as a sparsifying system in the context of compressed sensing
has become a topic of increased interest, and we refer to the initial paper [9].
The reader might also want to consult the extensive webpage dsp.rice.edu/cs con-
taining most published papers in the area of compressed sensing subdivided into different
topics. We would also like to draw the reader’s attention to the recent books [29] and [32] as
well as the survey article [7].

1.6 Outline
In Section 2, we start by discussing different sparsity models including structured sparsity
and sparsifying dictionaries. The next section, Section 3, is concerned with presenting both
necessary and sufficient conditions for exact recovery using ℓ1 minimization as a recovery
strategy. The delicateness of designing sensing matrices is the focus of Section 4. In Section
5, other algorithmic approaches to sparse recovery are presented. Finally, applications such
as data separation are discussed in Section 6.

2 Signal Models
Sparsity is the prior information assumed of the vector we intend to efficiently sense or whose
dimension we intend to reduce, depending on which viewpoint we take. We will start by
recalling some classical notions of sparsity. Since applications typically impose a certain
structure on the significant coefficients, various structured sparsity models were introduced
which we will subsequently present. Finally, we will discuss how to ensure sparsity through
an appropriate orthonormal basis or frame.

2.1 Sparsity
The most basic notion of sparsity states that a vector has at most k non-zero coefficients. This
is measured by the ℓ0 ‘norm’, which for simplicity we will throughout refer to as a norm
although it is well-known that k · k0 does not constitute a mathematical norm.
Definition 2.1 A vector x = (xi )ni=1 ∈ Rn is called k-sparse, if
kxk0 = #{i : xi 6= 0} ≤ k.
Copyright line will be provided by the publisher
gamm header will be provided by the publisher 7

The set of all k-sparse vectors is denoted by Σk .


We wish to emphasize that Σk is a highly non-linear set. Letting x ∈ Rn be a k-sparse
signal, it belongs to the linear subspace consisting of all vectors with the same support set.
Hence the set Σk is the union of all subspaces of vectors with support Λ satisfying |Λ| ≤ k.
From an application point of view, the situation of k-sparse vectors is however unrealistic,
wherefore various weaker versions were suggested. In the following definition we present one
possibility but do by no means claim this to be the most appropriate one. It might though be
very natural, since it analyzes the decay rate of the ℓp error of the best k-term approximation
of a vector.
Definition 2.2 Let 1 ≤ p < ∞ and r > 0. A vector x = (xi )ni=1 ∈ Rn is called
p-compressible with constant C and rate r, if

σk (x)p := min kx − x̃kp ≤ C · k −r for any k ∈ {1, . . . , n}.


x̃∈Σk

2.2 Structured Sparsity


Typically, the non-zero or significant coefficients do not arise in arbitrary patterns but are
rather highly structured. Think of the coefficients of a wavelet decomposition which exhibit a
tree structure, see also Figure 1. To take these considerations into account, structured sparsity
models were introduced. A first idea might be to identify the clustered set of significant
coefficients [22]. An application of this notion will be discussed in Section 6.
In the following definition as well as in the sequel, for some vector x = (xi )ni=1 ∈ Rn and
some subset Λ ⊂ {1, . . . , n}, the expression 1Λ x will denote the vector in Rn defined by

xi : i ∈ Λ,
(1Λ x)i = i = 1, . . . , n.
0 : i∈6 Λ,

Moreover, Λc will denote the complement of the set Λ in {1, . . . , n}.


Definition 2.3 Let Λ ⊂ {1, . . . , n} and δ > 0. A vector x = (xi )ni=1 ∈ Rn is then called
δ-relatively sparse with respect to Λ, if

k1Λc xk1 ≤ δ.

The notion of k-sparsity can also be regarded from a more general viewpoint, which simul-
taneously imposes additional structure. Let x ∈ Rn be a k-sparse signal. Then it belongs to
the linear subspace consisting of all vectors with the same support set. Hence the set Σk is the
union of all subspaces of vectors with support Λ satisfying |Λ| ≤ k. Thus, a natural extension
of this concept is the following definition, initially introduced in [49].
Definition 2.4 A vector x ∈ Rn is said to belong to a union of subspaces, if there exists a
family of subspaces (Wj )N n
j=1 in R such that

N
[
x∈ Wj .
j=1

Copyright line will be provided by the publisher


8 G. Kutyniok: Compressed Sensing

At about the same time, the notion of fusion frame sparsity was introduced in [6]. Fusion
frames are a set of subspaces having frame-like properties, thereby allowing for stability con-
siderations. A family of subspaces (Wj )N n
j=1 in R is a fusion frame with bounds A and B,
if
N
X
Akxk22 ≤ kPWj (x)k22 ≤ Bkxk22 for all x ∈ Rn ,
j=1

where PWj denotes the orthogonal projection onto the subspace Wj , see also [13] and [12,
Chapter 13]. Fusion frame theory extends classical frame theory by allowing the analy-
sis of signals through projections onto arbitrary dimensional subspaces as opposed to one-
dimensional subspaces in frame theory, hence serving also as a model for distributed process-
ing, cf. [62]. The notion of fusion frame sparsity then provides a more structured approach
than mere membership in a union of subspaces.
Applications such as manifold learning assume that the signal under consideration lives on
a general manifold, thereby forcing us to leave the world of linear subspaces. In such cases,
the signal class is often modeled as a non-linear k-dimensional manifold M in Rn , i.e.,

x ∈ M = {f (θ) : θ ∈ Θ}

with Θ being a k-dimensional parameter space. Such signals are then considered k-sparse in
the manifold model, see [65]. For a survey chapter about this topic, the interested reader is
referred to [32, Chapter 7].
We wish to finally mention that applications such as matrix completion require generaliza-
tions of vector sparsity by considering, for instance, low-rank matrix models. This is however
beyond the scope of this survey paper, and we refer to [32] for more details.

2.3 Sparsifying Dictionaries and Dictionary Learning


If the vector itself does not exhibit sparsity, we are required to sparsify it by choosing an
appropriate representation system – in this field typically coined dictionary. This problem
can be attacked in two ways, either non-adaptively or adaptively.
If certain characteristics of the signal are known, a dictionary can be chosen from the vast
class of already very well explored representation systems such as the Fourier basis, wavelets,
or shearlets, to name a few. The achieved sparsity might not be optimal, but various mathe-
matical properties of these systems are known and fast associated transforms are available.
Improved sparsity can be achieved by choosing the dictionary adaptive to the signals at
hand. For this, a test set of signals is required, based on which a dictionary is learnt. This
process is customarily termed dictionary learning. The most well-known and widely used al-
gorithm is the K-SVD algorithm introduced by Aharon, Elad, and Bruckstein in [2]. However,
from a mathematician’s point of view, this approach bears two problems which will hopefully
be both solved in the near future. First, almost no convergence results for such algorithms
are known. And, second, the learnt dictionaries do not exhibit any mathematically exploitable
structure, which makes not only an analysis very hard but also prevents the design of fast
associated transforms.
Copyright line will be provided by the publisher
gamm header will be provided by the publisher 9

3 Conditions for Sparse Recovery


After having introduced various sparsity notions, in this sense signal models, we next consider
which conditions we need to impose on the sparsity of the original vector and on the sensing
matrix for exact recovery. For the sparse recovery method, we will focus on ℓ1 minimization
similar to most published results and refer to Section 5 for further algorithmic approaches.
In the sequel of the present section, several incoherence conditions for sensing matrices will
be introduced. Section 4 then discusses examples of matrices fulfilling those. Finally, we
mention that most results can be slightly modified to incorporate measurements affected by
additive noise, i.e., if y = Ax + ν with kνk2 ≤ ε.

3.1 Uniqueness Conditions for Minimization Problems


We start by presenting conditions for uniqueness of the solutions to the minimization problems
(P0 ) and (P1 ) which we introduced in Subsection 1.3.

3.1.1 Uniqueness of (P0 )


The correct condition on the sensing matrix is phrased in terms of the so-called spark, whose
definition we first recall. This notion was introduced in [19] and verbally fuses the notions of
‘sparse’ and ‘rank’.
Definition 3.1 Let A be an m × n matrix. Then the spark of A denoted by spark(A) is the
minimal number of linearly dependent columns of A.
It is useful to reformulate this notion in terms of the null space of A, which we will through-
out denote by N (A), and state its range. The proof is obvious. For the definition of Σk , we
refer to Definition 2.1.
Lemma 3.2 Let A be an m × n matrix. Then

spark(A) = min{k : N (A) ∩ Σk 6= {0}}

and spark(A) ∈ [2, m + 1].


This notion enables us to derive an equivalent condition on unique solvability of (P0 ).
Since the proof is short, we state it for clarity purposes.
Theorem 3.3 ( [19]) Let A be an m × n matrix, and let k ∈ N. Then the following
conditions are equivalent.

(i) If a solution x of (P0 ) satisfies kxk0 ≤ k, then this is the unique solution.

(ii) k < spark(A)/2.

Proof. (i) ⇒ (ii). We argue by contradiction. If (ii) does not hold, by Lemma 3.2, there exists
some h ∈ N (A), h 6= 0 such that khk0 ≤ 2k. Thus, there exist x and x̃ satisfying h = x − x̃
and kxk0 , kx̃k0 ≤ k, but Ax = Ax̃, a contradiction to (i).
(ii) ⇒ (i). Let x and x̃ satisfy y = Ax = Ax̃ and kxk0 , kx̃k0 ≤ k. Thus x − x̃ ∈ N (A)
and kx − x̃k0 ≤ 2k < spark(A). By Lemma 3.2, it follows that x − x̃ = 0, which implies
(i).
Copyright line will be provided by the publisher
10 G. Kutyniok: Compressed Sensing

3.1.2 Uniqueness of (P1 )


Due to the underdeterminedness of A and hence the ill-posedness of the recovery problem, in
the analysis of uniqueness of the minimization problem (P1 ), the null space of A also plays a
particular role. The related so-called null space property, first introduced in [15], is defined as
follows.
Definition 3.4 Let A be an m × n matrix. Then A has the null space property (NSP) of
order k, if, for all h ∈ N (A) \ {0} and for all index sets |Λ| ≤ k,

k1Λ hk1 < 12 khk1 .

An equivalent condition for the existence of a unique sparse solution of (P1 ) can now be
stated in terms of the null space property. For the proof, we refer to [15].
Theorem 3.5 ( [15]) Let A be an m × n matrix, and let k ∈ N. Then the following
conditions are equivalent.

(i) If a solution x of (P1 ) satisfies kxk0 ≤ k, then this is the unique solution.

(ii) A satisfies the null space property of order k.

It should be emphasized that [15] studies the Compressed Sensing Problem in a much more
general way by analyzing quite general encoding-decoding strategies.

3.2 Sufficient Conditions


The core of compressed sensing is to determine when ‘ℓ0 = ℓ1 ’, i.e., when the solutions of
(P0 ) and (P1 ) coincide. The most well-known sufficient conditions for this to hold true are
phrased in terms of mutual coherence and of the restricted isometry property.

3.2.1 Mutual Coherence


The mutual coherence of a matrix, initially introduced in [21], measures the smallest angle
between each pair of its columns.
Definition 3.6 Let A = (ai )ni=1 be an m × n matrix. Then its mutual coherence µ(A) is
defined as
|hai , aj i|
µ(A) = max .
i6=j kai k2 kaj k2

The maximal mutual coherence of a matrix certainly equals 1 in the case when two columns
are linearly dependent. The lower bound presented in the next result, also known as the
Welch bound, is more interesting. It can be shown that it is attained by so-called optimal
Grassmannian frames [63], see also Section 4.
Lemma 3.7 Let A be an m × n matrix. Then we have
hr n − m i
µ(A) ∈ ,1 .
m(n − 1)

Copyright line will be provided by the publisher


gamm header will be provided by the publisher 11

Let us mention that different variants of mutual coherence exist, in particular, the Babel
function [19], the cumulative coherence function [64], the structured p-Babel function [4], the
fusion coherence [6], and cluster coherence [22]. The notion of cluster coherence will in fact
be later discussed in Section 6 for a particular application.
Imposing a bound on the sparsity of the original vector by the mutual coherence of the
sensing matrix, the following result can be shown; its proof can be found in [19].
Theorem 3.8 ( [19, 30]) Let A be an m × n matrix, and let x ∈ Rn \ {0} be a solution of
(P0 ) satisfying
kxk0 < 12 (1 + µ(A)−1 ).
Then x is the unique solution of (P0 ) and (P1 ).

3.2.2 Restricted Isometry Property


We next discuss the restricted isometry property, initially introduced in [11]. It measures the
degree to which each submatrix consisting of k column vectors of A is close to being an
isometry. Notice that this notion automatically ensures stability, as will become evident in the
next theorem.
Definition 3.9 Let A be an m × n matrix. Then A has the Restricted Isometry Property
(RIP) of order k, if there exists a δk ∈ (0, 1) such that
(1 − δk )kxk22 ≤ kAxk22 ≤ (1 + δk )kxk22 for all x ∈ Σk .
Several variations of this notion were introduced during the last years, of which examples
are the fusion RIP [6] and the D-RIP [9].
Although also for mutual coherence, error estimates for recovery from noisy data are
known, in the setting of the RIP those are very natural. In fact, the error can be phrased
in terms of the best k-term approximation (cf. Definition 2.2) as follows.
Theorem
√ 3.10 ( [8,15]) Let A be an m × n matrix which satisfies the RIP of order 2k with
δ2k < 2 − 1. Let x ∈ Rn , and let x̂ be a solution of the associated (P1 ) problem. Then
 σ (x) 
k
kx − x̂k2 ≤ C · √ 1
k
for some constant C dependent on δ2k .
The best known RIP condition for sparse recovery by (P1 ) states that (P1 ) recovers all
k-sparse vectors provided the measurement matrix A satisfies δ2k < 0.473, see [34].

3.3 Necessary Conditions


Meaningful necessary conditions for ‘ℓ0 = ℓ1 ’ in the sense of (P0 ) = (P1 ) are significantly
harder to achieve. An interesting string of research was initiated by Donoho and Tanner with
the two papers [25, 26]. The main idea is to derive equivalent conditions utilizing the theory
of convex polytopes. For this, let C n be defined by
C n = {x ∈ Rn : kxk1 ≤ 1}. (1)
A condition equivalent to ‘ℓ0 = ℓ1 ’ can then be formulated in terms of properties of a partic-
ular related polytope. For the relevant notions from polytope theory, we refer to [37].
Copyright line will be provided by the publisher
12 G. Kutyniok: Compressed Sensing

Theorem 3.11 ( [25, 26]) Let C n be defined as in (1), let A be an m × n matrix, and let
the polytope P be defined by P = AC n ⊆ Rm . Then the following conditions are equivalent.
(i) The number of k-faces of P equals the number of k-faces of C n .
(ii) (P0 ) = (P1 ).
The geometric intuition behind this result is the fact that the number of k-faces of P equals
the number of indexing sets Λ ⊆ {1, . . . , n} with |Λ| = k such that vectors x satisfying
supp x = Λ can be recovered via (P1 ).
Extending these techniques, Donoho and Tanner were also able to provide highly accurate
analytical descriptions of the occurring phase transition when considering the area of exact
recovery dependent on the ratio of the number of equations to the number of unknowns n/m
versus the ratio of the number of nonzeros to the number of equations k/n. The interested
reader is referred to [27] for further details.

4 Sensing Matrices
Ideally, we aim for a matrix which has high spark, low mutual coherence, and a small RIP
constant. As our discussion in this section will show, these properties are often quite difficult
to achieve, and even computing, for instance, the RIP constant is computationally intractable
in general (see [59]).
In the sequel, after presenting some general relations between the introduced notions of
spark, NSP, mutual coherence, and RIP, we will discuss some explicit constructions for, in
particular, mutual coherence and RIP.

4.1 Relations between Spark, NSP, Mutual Coherence, and RIP


Before discussing different approaches to construct a sensing matrix, we first present several
known relations between the introduced notions spark, NSP, mutual coherence, and RIP. This
allows to easily compute or at least estimate other measures, if a sensing matrix is designed
for a particular measure. For the proofs of the different statements, we refer to [32, Chapter
1].
Lemma 4.1 Let A be an m × n matrix with normalized columns.
(i) We have
1
spark(A) ≥ 1 + .
µ(A)

(ii) A satisfies the RIP of order k with δk = kµ(A) for all k < µ(A)−1 .

(iii) Suppose A satisfies the RIP of order 2k with δ2k < 2 − 1. If
√ r
2δ2k k
√ < ,
1 − (1 + 2)δ2k n

then A satisfies the NSP of order 2k.

Copyright line will be provided by the publisher


gamm header will be provided by the publisher 13

4.2 Spark and Mutual Coherence


Let us now provide some exemplary classes of sensing matrices with advantageous spark and
mutual coherence properties.
The first observation one can make (see also [15]) is that an m × n Vandermonde matrix
A satisfies
spark(A) = m + 1.
One serious drawback though is the fact that these matrices become badly conditioned as
n → ∞.
Turning to the weaker notion of mutual coherence, of particular interest – compare Sub-
section 6.1 – are sensing matrices composed of two orthonormal bases or frames for Rm . If
the two orthonormal bases Φ1 and Φ2 , say, are chosen to be mutually unbiased such as the
Fourier and the Dirac basis (the standard basis), then
1
µ([Φ1 |Φ2 ]) = √ ,
m
which is the optimal bound on mutual coherence for such types of m × 2m sensing matrix.
Other constructions are known for m×m2 matrices A generated from the Alltop sequence [38]
or by using Grassmannian frames [63], in which cases the optimal lower bound is attained:
1
µ(A) = √ .
m
The number of measurements required for recovery of a k-sparse signal can then be deter-
mined to be m = O(k 2 log n).

4.3 RIP
We begin by discussing some deterministic constructions of matrices satisfying the RIP. The
first noteworthy construction was presented by DeVore and requires m & k 2 , see [17]. A very
recent, highly sophisticated approach [5] by Bourgain et al. still requires m & k 2−α with
some small constant α. Hence up to now deterministic constructions require a large m, which
is typically not feasible for applications, since it scales quadratically in k.
The construction of random sensing matrices satisfying RIP is a possibility to circumvent
this problem. Such constructions are closely linked to the famous Johnson-Lindenstrauss
Lemma, which is extensively utilized in numerical linear algebra, machine learning, and other
areas requiring dimension reduction.
Theorem 4.2 (Johnson-Lindenstrauss Lemma [41]) Let ε ∈ (0, 1), let x1 , . . . , xp ∈ Rn ,
and let m = O(ε−2 log p) be a positive integer. Then there exists a Lipschitz map f : Rn →
Rm such that
(1 − ε)kxi − xj k22 ≤ kf (xi )− f (xj )k22 ≤ (1 + ε)kxi − xj k22 for all i, j ∈ {1, . . . , p}.
The key requirement for a matrix to satisfy the Johnson-Lindenstrauss Lemma with high
probability is the following concentration inequality for an arbitrarily fixed x ∈ Rn :
  2
P (1 − ε)kxk22 ≤ kAxk22 ≤ (1 + ε)kxk22 ≤ 1 − 2e−c0 ε m , (2)

Copyright line will be provided by the publisher


14 G. Kutyniok: Compressed Sensing

with the entries of A being generated by a certain probability distribution. The relation of RIP
to the Johnson-Lindenstrauss Lemma is established in the following result. We also mention
that recently even a converse of the following theorem was proved in [43].
Theorem 4.3 ( [3]) Let δ ∈ (0, 1). If the probability distribution generating the m × n
matrices A satisfies the concentration inequality (2) with ε = δ, then there exist constants
2
c1 , c2 such that, with probability ≤ 1 − 2e−c2 δ m , A satisfies the RIP of order k with δ for all
2
k ≤ c1 δ m/ log(n/k).
This observation was then used in [3] to prove that Gaussian and Bernoulli random matrices
satisfy the RIP of order k with δ provided that m & δ −2 k log(n/k). Up to a constant, lower
bounds for Gelfand widths of ℓ1 -balls [35] show that this dependence on k and n is indeed
optimal.

5 Recovery Algorithms
In this section, we will provide a brief overview of the different types of algorithms typically
used for sparse recovery. Convex optimization algorithms require very few measurements but
are computationally more complex. On the other extreme are combinatorial algorithms, which
are very fast – often sublinear – but require many measurements that are sometimes difficult
to obtain. Greedy algorithms are in some sense a good compromise between those extremes
concerning computational complexity and the required number of measurements.

5.1 Convex Optimization


In Subsection 1.3, we already stated the convex optimization problem

min kxk1 subject to y = Ax


x

most commonly used. If the measurements are affected by noise, a conic constraint is re-
quired; i.e., the minimization problem needs to be changed to

min kxk1 subject to kAx − yk22 ≤ ε,


x

for a carefully chosen ε > 0. For a particular regularization parameter λ > 0, this problem is
equivalent to the unconstrained version given by

min 12 kAx − yk22 + λkxk1 .


x

Developed convex optimization algorithms specifically adapted to the compressed sensing


setting include interior-point methods [11], projected gradient methods [33], and iterative
thresholding [16]. The reader might also be interested to check the webpages www-stat.
stanford.edu/˜candes/l1magic and sparselab.stanford.edu for available
code. It is worth pointing out that the intense research performed in this area has slightly
diminished the computational disadvantage of convex optimization algorithms for compressed
sensing as compared to greedy type algorithms.
Copyright line will be provided by the publisher
gamm header will be provided by the publisher 15

5.2 Greedy Algorithms


Greedy algorithms iteratively approximate the coefficients and the support of the original sig-
nal. They have the advantage of being very fast and easy to implement. Often the theoretical
performance guarantees are very similar to, for instance, ℓ1 minimization results.
The most well-known greedy approach is Orthogonal Matching Pursuit, which is described
in Figure 4. OMP was introduced in [57] as an improved successor of Matching Pursuit [51].

Input:
• Matrix A = (ai )ni=1 ∈ Rm×n and vector x ∈ Rn .
• Error threshold ε.
Algorithm:
1) Set k = 0.
2) Set the initial solution x0 = 0.
3) Set the initial residual r0 = y − Ax0 = y.
4) Set the initial support S 0 = supp x0 = ∅.
5) Repeat
6) Set k = k + 1.
7) Choose i0 such that minc kcai0 − rk−1 k2 ≤ minc kcai − rk−1 k2 for all i.
8) Set S k = S k−1 ∪ {i0 }.
9) Compute xk = argminx kAx − yk2 subject to supp x = S k .
10) Compute rk = y − Axk .
11) until krk k2 < ε.
Output:
• Approximate solution xk .

Fig. 4 Orthogonal Matching Pursuit (OMP): Approximation of the solution of (P0 )

Interestingly, a theorem similar to Theorem 3.8 can be proven for OMP.


Theorem 5.1 ( [20, 64]) Let A be an m × n matrix, and let x ∈ Rn \ {0} be a solution of
(P0 ) satisfying

kxk0 < 21 (1 + µ(A)−1 ).

Then OMP with error threshold ε = 0 recovers x.


Other prominent examples of greedy algorithms are Stagewise OMP (StOMP) [28], Regu-
larized OMP (ROMP) [56], and Compressive Sampling MP (CoSaMP) [55]. For a survey of
these methods, we wish to refer to [32, Chapter 8].
An intriguing, very recently developed class of algorithms is Orthogonal Matching Pursuit
with Replacement (OMPR) [40], which not only includes most iterative (hard)-thresholding
algorithms as special cases, but this approach also permits the tightest known analysis in terms
of RIP conditions. By extending OMPR using locality sensitive hashing (OMPR-Hash), this
also leads to the first provably sub-linear algorithm for sparse recovery, see [40]. Another
Copyright line will be provided by the publisher
16 G. Kutyniok: Compressed Sensing

recent development is message passing algorithms for compressed sensing pioneered in [23];
a survey on those can be found in [32, Chapter 9].

5.3 Combinatorial Algorithms


These methods apply group testing to highly structured samples of the original signal, but are
far less used in compressed sensing as opposed to convex optimization and greedy algorithms.
From the various types of algorithms, we mention the HHS pursuit [36] and a sub-linear
Fourier transform [39].

6 Applications
We now turn to some applications of compressed sensing. Two of those we will discuss in
more detail, namely data separation and recovery of missing data.

6.1 Data Separation


The data separation problem can be stated in the following way. Let x = x1 + x2 ∈ Rn .
Assuming we are just given x, how can we extract x1 and x2 from it? At first glance, this
seems to be impossible, since there are two unknowns for one datum.

6.1.1 An Orthonormal Basis Approach


The first approach to apply compressed sensing techniques consists in choosing appropriate
orthonormal bases Φ1 and Φ2 for Rn such that the coefficient vectors ΦTi xi (i = 1, 2) are
sparse. This leads to the following underdetermined linear system of equations:
 
c1
x = [ Φ1 | Φ2 ] .
c2
Compressed sensing now suggests to solve
   
c1 c1
min subject to x = [ Φ1 | Φ2 ] . (3)
c1 ,c2 c2 1
c2
If the sparse vector [ΦT1 x1 , ΦT2 x2 ]T can be recovered, the data separation problem can be
solved by computing
x1 = Φ1 (ΦT1 x1 ) and x2 = Φ2 (ΦT2 x2 ).
Obviously, separation can only be achieved provided that the components x1 and x2 are in
some sense morphologically distinct. Notice that this property is indeed encoded in the prob-
lem if one requires incoherence of the matrix [ Φ1 | Φ2 ].
In fact, this type of problem can be regarded as the birth of compressed sensing, since the
fundamental paper [21] by Donoho and Huo analyzed a particular data separation problem,
namely the separation of sinusoids and spikes. In this setting, x1 consists of n samples of a
continuum domain signal which is a superposition of sinusoids:
n−1
!
1 X 2πiωt/n
x1 = √ c1,ω e
n ω=0
0≤t≤n−1

Copyright line will be provided by the publisher


gamm header will be provided by the publisher 17

Letting Φ1 be the Fourier basis, the coefficient vector


 
ΦT1 x1 = c1 , where Φ1 = [ϕ1,0 | . . . |ϕ1,n−1 ] with ϕ1,ω = √1 e2πiωt/n
n
,
0≤t≤n−1

is sparse. The vector x2 consists of n samples of a continuum domain signal which is a


superposition of spikes, i.e., has few non-zero coefficients. Thus, letting Φ2 denote the Dirac
basis (standard basis), the coefficient vector

ΦT2 x2 = x2 = c2

is also sparse. Since the mutual coherence of the matrix [ Φ1 | Φ2 ] can be computed to be √1 ,
n
Theorem 3.8 implies the following result.
Theorem 6.1 ( [21, 30]) Let x1 , x2 and Φ1 , Φ√ 2 be defined as in the previous paragraph,
and assume that kΦT1 x1 k0 + kΦT2 x2 k0 < 12 (1 + n). Then

ΦT1 x1
     
c1 c1
= argminc1 ,c2 subject to x = [ Φ1 | Φ2 ] .
ΦT2 x2 c2 1
c2

6.1.2 A Frame Approach


Now assume that we cannot find sparsifying orthonormal bases but Parseval frames2 Φ1 and
Φ2 – notice that this situation is much more likely due to the advantageous redundancy of a
frame. In this case, the minimization problem we stated in (3) faces the following problem:
We are merely interested in the separation x = x1 + x2 . However, for each such separation,
due to the redundancy of the frames the minimization problem searches through infinitely
many coefficients [c1 , c2 ]T satisfying xi = Φi ci , i = 1, 2. Thus it computes not only much
more than necessary – in fact, it even computes the sparsest coefficient sequence of x with re-
spect to the dictionary [Φ1 |Φ2 ] – but this also causes numerical instabilities if the redundancy
of the frames is too high.
To avoid this problem, we place the ℓ1 norm on the analysis, rather than the synthesis side
as already mentioned in Subsection 1.5. Utilizing the fact that Φ1 and Φ2 are Parseval frames,
i.e., that Φi ΦTi = I (i = 1, 2), we can write

x = x1 + x2 = Φ1 (ΦT1 x1 ) + Φ2 (ΦT2 x2 ).

This particular choice of coefficients – which are in frame theory language termed analysis
coefficients – leads to the minimization problem

min kΦT1 x̃1 k1 + kΦT2 x̃2 k1 subject to x = x̃1 + x̃2 . (4)


x̃1 ,x̃2

Interestingly, the associated recovery results employ structured sparsity, wherefore we will
also briefly present those. First, the notion of relative sparsity (cf. Definition 2.3) is adapted
to this situation.

2 Recall that Φ is a Parseval frame, if ΦΦT = I.

Copyright line will be provided by the publisher


18 G. Kutyniok: Compressed Sensing

Definition 6.2 Let Φ1 and Φ2 be Parseval frames for Rn with indexing sets {1, . . . , N1 }
and {1, . . . , N2 }, respectively, let Λi ⊂ {1, . . . , Ni }, i = 1, 2, and let δ > 0. Then the vectors
x1 and x2 are called δ-relatively sparse in Φ1 and Φ2 with respect to Λ1 and Λ2 , if

k1Λc1 ΦT1 x1 k1 + k1Λc2 ΦT2 x2 k1 ≤ δ.

Second, the notion of mutual coherence is adapted to structured sparsity as already dis-
cussed in Subsection 3.2.1. This leads to the following definition of cluster coherence.
Definition 6.3 Let Φ1 = (ϕ1i )N N2 n
i=1 and Φ2 = (ϕ2j )j=1 be Parseval frames for R , respec-
1

tively, and let Λ1 ⊂ {1, . . . , N1 }. Then the cluster coherence µc (Λ1 , Φ1 ; Φ2 ) of Φ1 and Φ2
with respect to Λ1 is defined by
X
µc (Λ1 , Φ1 ; Φ2 ) = max |hϕ1i , ϕ2j i|.
j=1,...,N2
i∈Λ1

The performance of the minimization problem (4) can then be analyzed as follows. It
should be emphasized that the clusters of significant coefficients Λ1 and Λ2 are a mere analysis
tool; the algorithm does not take those into account. Further, notice that the choice of those
sets is highly delicate in its impact on the separation estimate. For the proof of the result, we
refer to [22].
Theorem 6.4 ( [22]) Let x = x1 + x2 ∈ Rn , let Φ1 and Φ2 be Parseval frames for Rn with
indexing sets {1, . . . , N1 } and {1, . . . , N2 }, respectively, and let Λi ⊂ {1, . . . , Ni }, i = 1, 2.
Further, suppose that x1 and x2 are δ-relatively sparse in Φ1 and Φ2 with respect to Λ1 and
Λ2 , and let [x⋆1 , x⋆2 ]T be a solution of the minimization problem (4). Then

kx⋆1 − x1 k2 + kx⋆2 − x2 k2 ≤ ,
1 − 2µc
where µc = max{µc (Λ1 , Φ1 ; Φ2 ), µc (Λ2 , Φ2 ; Φ1 )}.
Let us finally mention that data separation via compressed sensing has been applied, for
instance, in imaging sciences for the separation of point- and curvelike objects, a problem ap-
pearing in several areas such as in astronomical imaging when separating stars from filaments
and in neurobiological imaging when separating spines from dendrites. Figure 5 illustrates a
numerical result from [48] using wavelets (see [50]) and shearlets (see [46,47]) as sparsifying
frames. A theoretical foundation for separation of point- and curvelike objects by ℓ1 mini-
mization is developed in [22]. When considering thresholding as separation method for such
features, even stronger theoretical results could be proven in [45]. Moreover, a first analysis of
separation of cartoon and texture – very commonly present in natural images – was performed
in [44].
For more details on data separation using compressed sensing techniques, we refer to [32,
Chapter 11].

6.2 Recovery of Missing Data


The problem of recovery of missing data can be formulated as follows. Let x = xK + xM ∈
W ⊕ W ⊥ , where W is a subspace of Rn . We assume only xK is known to us, and we aim to
recover x. Again, this seems unfeasible unless we have additional information.
Copyright line will be provided by the publisher
gamm header will be provided by the publisher 19

Fig. 5 Separation of a neurobiological image using wavelets and shearlets [48]

6.2.1 An Orthonormal Basis Approach


We now assume that – although x is not known to us – we at least know that it is sparsified by
an orthonormal basis Φ, say. Letting PW and PW ⊥ denote the orthogonal projections onto W
and W ⊥ , respectively, we are led to solve the underdetermined problem
PW Φc = PW x
for the sparse solution c. As in the case of data separation, from a compressed sensing view-
point it is suggestive to solve
min kck1 subject to PW Φc = PW x. (5)
c

The original vector x can then be recovered via x = Φc. The solution of the inpainting
problem – a terminology used for recovery of missing data in imaging science – was first
considered in [31].
Application of Theorem 3.8 provides a sufficient condition for missing data recovery to
succeed.
Theorem 6.5 ( [19]) Let x ∈ Rn , let W be a subspace of Rn , and let Φ be an orthonormal
basis for Rn . If kΦT xk0 < 21 (1 + µ(PW Φ)−1 ), then

ΦT x = argminc kck1 subject to PW Φc = PW x.

6.2.2 A Frame Approach


As before, we now assume that the sparsifying system Φ is a redundant Parseval frame. The
adapted version to (5), which places the ℓ1 norm on the analysis side, reads
min kΦT x̃k1 subject to PW x̃ = PW x. (6)

Employing relative sparsity and cluster coherence, an error analysis can be derived in a
similar way as before. For the proof, the reader might want to consult [42].
Copyright line will be provided by the publisher
20 G. Kutyniok: Compressed Sensing

Theorem 6.6 ( [42]) Let x ∈ Rn , let Φ be a Parseval frame for Rn with indexing set
{1, . . . , N }, and let Λ ⊂ {1, . . . , N }. Further, suppose that x is δ-relatively sparse in Φ with
respect to Λ, and let x⋆ be a solution of the minimization problem (6). Then


kx⋆ − xk2 ≤ ,
1 − 2µc

where µc = µc (Λ, PW ⊥ Φ; Φ).

6.3 Further Applications


Other applications of compressed sensing include coding and information theory, machine
learning, hyperspectral imaging, geophysical data analysis, computational biology, remote
sensing, radar analysis, robotics and control, A/D conversion, and many more. Since an
elaborate discussion of all those topics would go beyond the scope of this survey paper, we
refer the interested reader to dsp.rice.edu/cs.

Acknowledgements The author is grateful to the reviewers for many helpful suggestions which im-
proved the presentation of the paper. She would also like to thank Emmanuel Candès, David Donoho,
Michael Elad, and Yonina Eldar for various discussions on related topics, and Sadegh Jokar for pro-
ducing Figure 3. The author acknowledges support by the Einstein Foundation Berlin, by Deutsche
Forschungsgemeinschaft (DFG) Grants SPP-1324 KU 1446/13 and KU 1446/14, and by the DFG Re-
search Center M ATHEON “Mathematics for key technologies” in Berlin.

References
[1] B. Adcock and A. C. Hansen. Generalized sampling and infinite dimensional compressed sensing.
Preprint, 2012.
[2] M. Aharon, M. Elad, and A. M. Bruckstein. The K-SVD: An algorithm for designing of overcom-
plete dictionaries for sparse representation. IEEE Trans. Signal Proc., 54:4311–4322, 2006.
[3] R. G. Baraniuk, M. Davenport, R. A. DeVore, and M. Wakin. A simple proof of the Restricted
Isometry Property for random matrices. Constr. Approx., 28:253-263, 2008.
[4] L. Borup, R. Gribonval, and M. Nielsen. Beyond coherence: Recovering structured time-frequency
representations. Appl. Comput. Harmon. Anal., 14:120–128, 2008.
[5] J. Bourgain, S. Dilworth, K. Ford, S. Konyagin, and D. Kutzarova. Explicit constructions of rip
matrices and related problems. Duke Math. J., 159:145–185, 2011.
[6] B. Boufounos, G. Kutyniok, and H. Rauhut. Sparse recovery from combined fusion frame mea-
surements. IEEE Trans. Inform. Theory, 57:3864–387, 2011.
[7] A. M. Bruckstein, D. L. Donoho, and A. Elad. From sparse solutions of systems of equations to
sparse modeling of signals and images. SIAM Rev., 51:34–81, 2009.
[8] E. J. Candès. The restricted isometry property and its implications for compressed sensing. C. R.
Acad. Sci. I, 346:589–592, 2008.
[9] E. J. Candès, Y. C. Eldar, D. Needell, and P. Randall. Compressed Sensing with Coherent and
Redundant Dictionaries. Appl. Comput. Harmon. Anal., 31:59–73, 2011.
[10] E. J. Candès and B. Recht. Exact matrix completion via convex optimization. Found. of Comput.
Math., 9:717–772, 2008.
[11] E. Candès, J. Romberg, and T. Tao. Robust uncertainty principles: Exact signal reconstruction
from highly incomplete Fourier information. IEEE Trans. Inform. Theory, 52:489-509, 2006.

Copyright line will be provided by the publisher


gamm header will be provided by the publisher 21

[12] P. G. Casazza and G. Kutyniok. Finite Frames: Theory and Applications, Birkhäuser, Boston,
2012.
[13] P. G. Casazza, G. Kutyniok, and S. Li. Fusion Frames and Distributed Processing. Appl. Comput.
Harmon. Anal. 25:114–132, 2008.
[14] S. S. Chen, D. L. Donoho, and M. A. Saunders. Atomic decomposition by basis pursuit. SIAM J.
Sci. Comput., 20:33–61, 1998.
[15] A. Cohen, W. Dahmen, and R. DeVore. Compressed sensing and best k-term approximation. J.
Am. Math. Soc., 22:211–231, 2009.
[16] I. Daubechies, M. Defrise, and C. De Mol. An iterative thresholding algorithm for linear inverse
problems with a sparsity constraint. Comm. Pure Appl. Math., 57:1413-1457, 2004.
[17] R. DeVore. Deterministic constructions of compressed sensing matrices. J. Complexity, 23:918–
925, 2007.
[18] D. L. Donoho. Compressed sensing. IEEE Trans. Inform. Theory, 52:1289–1306, 2006.
[19] D. L. Donoho and M. Elad. Optimally sparse representation in general (nonorthogonal) dictionar-
ies via l1 minimization, Proc. Natl. Acad. Sci. USA, 100:2197–2202, 2003.
[20] D. L. Donoho, M. Elad, and V. Temlyakov. Stable recovery of sparse overcomplete representations
in the presence of noise. IEEE Trans. Inform. Theory, 52:6–18, 2006.
[21] D. L. Donoho and X. Huo. Uncertainty principles and ideal atomic decomposition. IEEE Trans.
Inform. Theory, 47:2845–2862, 2001.
[22] D. L. Donoho and G. Kutyniok. Microlocal analysis of the geometric separation problem. Comm.
Pure Appl. Math., 66:1–47, 2013.
[23] D. L. Donoho, A. Maleki, and A. Montanari. Message passing algorithms for compressed sensing.
Proc. Natl. Acad. Sci. USA, 106:18914–18919, 2009.
[24] D. L. Donoho and P. B. Starck. Uncertainty principles and signal recovery. SIAM J. Appl. Math.,
49:906–931, 1989.
[25] D. L. Donoho and J. Tanner. Neighborliness of Randomly-Projected Simplices in High Dimen-
sions. Proc. Natl. Acad. Sci. USA, 102:9452–9457, 2005.
[26] D. L. Donoho and J. Tanner. Sparse Nonnegative Solutions of Underdetermined Linear Equations
by Linear Programming. Proc. Natl. Acad. Sci. USA, 102:9446–9451, 2005.
[27] D. L. Donoho and J. Tanner. Observed universality of phase transitions in high-dimensional ge-
ometry, with implications for modern data analysis and signal processing. Philos. Trans. Roy. Soc.
S.-A, 367:4273–4293, 2009.
[28] D. L. Donoho, Y. Tsaig, I. Drori, and J.-L. Starck. Sparse Solution of Underdetermined Linear
Equations by Stagewise Orthogonal Matching Pursuit. Preprint, 2007.
[29] M. Elad. Sparse and Redundant Representations. Springer, New York, 2010.
[30] M. Elad and A. M. Bruckstein. A generalized uncertainty principle and sparse representation in
pairs of bases. IEEE Trans. Inform. Theory, 48:2558–2567, 2002.
[31] M. Elad, J.-L. Starck, P. Querre, and D. L. Donoho. Simultaneous cartoon and texture image in-
painting using morphological component analysis (MCA). Appl. Comput. Harmon. Anal., 19:340–
358, 2005.
[32] Y. C. Eldar and G. Kutyniok. Compressed Sensing: Theory and Applications. Cambridge Univer-
sity Press, 2012.
[33] M. A. T. Figueiredo, R. D. Nowak, and S. J. Wright. Gradient projection for sparse reconstruction:
Application to compressed sensing and other inverse problems. IEEE J. Sel. Top. Signa., 1:586–
597, 2007.
[34] S. Foucart. A note on guaranteed sparse recovery via .1-minimization. Appl. Comput. Harmon.
Anal., 29:97–103, 2010.
[35] S. Foucart, A. Pajor, H. Rauhut, and T. Ullrich. The Gelfand widths of ℓp -balls for 0 < p ≤ 1. J.
Complexity, 26:629–640, 2010.
[36] A. C. Gilbert, M. J. Strauss, and R. Vershynin. One sketch for all: Fast algorithms for Compressed
Sensing. In Proc. 39th ACM Symp. Theory of Computing (STOC), San Diego, CA, 2007.

Copyright line will be provided by the publisher


22 G. Kutyniok: Compressed Sensing

[37] B. Grünbaum. Convex polytopes. Graduate Texts in Mathematics 221, Springer-Verlag, New York,
2003.
[38] M. Herman and T. Strohmer. High Resolution Radar via Compressed Sensing. IEEE Trans. Signal
Proc., 57:2275–2284, 2009.
[39] M. A. Iwen. Combinatorial Sublinear-Time Fourier Algorithms. Found. of Comput. Math.,
10:303–338, 2010.
[40] P. Jain, A. Tewari, and I. S. Dhillon. Orthogonal Matching Pursuit with Replacement. In Proc.
Neural Inform. Process. Systems Conf. (NIPS), 2011.
[41] W. B. Johnson and J. Lindenstrauss. Extensions of Lipschitz mappings into a Hilbert space. Con-
temp. Math, 26:189-206, 1984.
[42] E. King, G. Kutyniok, and X. Zhuang. Analysis of Inpainting via Clustered Sparsity and Microlo-
cal Analysis. J. Math. Imaging Vis., to appear.
[43] F. Krahmer and R. Ward. New and improved Johnson-Lindenstrauss embeddings via the Restricted
Isometry Property. SIAM J. Math. Anal., 43:1269–1281, 2011.
[44] G. Kutyniok. Clustered Sparsity and Separation of Cartoon and Texture. SIAM J. Imaging Sci. 6
(2013), 848-874.
[45] G. Kutyniok. Geometric Separation by Single-Pass Alternating Thresholding. Appl. Comput. Har-
mon. Anal., to appear.
[46] G. Kutyniok and D. Labate. Shearlets: Multiscale Analysis for Multivariate Data. Birkhäuser,
Boston, 2012.
[47] G. Kutyniok and W.-Q Lim. Compactly supported shearlets are optimally sparse. J. Approx. The-
ory, 163:1564–1589, 2011.
[48] G. Kutyniok and W.-Q Lim. Image separation using shearlets. In Curves and Surfaces (Avignon,
France, 2010), Lecture Notes in Computer Science 6920, Springer, 2012.
[49] Y. Lu and M. Do. Sampling signals from a union of subspaces. IEEE Signal Proc. Mag., 25:41–47,
2008.
[50] S. G. Mallat. A wavelet tour of signal processing: The sparse way. Academic Press, Inc., San
Diego, CA, 1998.
[51] S. G. Mallat and Z. Zhang. Matching pursuits with time-frequency dictionaries. IEEE Trans. Signal
Proc., 41:3397–3415, 1993.
[52] M. Mishali, Y. C. Eldar, and A. Elron. Xampling: Signal Acquisition and Processing in Union of
Subspaces. IEEE Trans. Signal Proc., 59:4719-4734, 2011.
[53] S. Muthukrishnan. Data Streams: Algorithms and Applications. Now Publishers, Boston, MA,
2005.
[54] S. Nam, M. E. Davies, M. Elad, and R. Gribonval. The Cosparse Analysis Model and Algorithms.
Appl. Comput. Harmon. Anal., 34:30–56, 2013.
[55] D. Needell and J. A. Tropp. CoSaMP: Iterative signal recovery from incomplete and inaccurate
samples. Appl. Comput. Harmon. Anal., 26:301–321, 2008.
[56] D. Needell and R. Vershynin. Uniform Uncertainty Principle and signal recovery via Regularized
Orthogonal Matching Pursuit. Found. of Comput. Math., 9:317–334, 2009.
[57] Y. C. Pati, R. Rezaiifar, and P. S. Krishnaprasad. Orthogonal matching pursuit: Recursive function
approximation with applications to wavelet decomposition. In Proc. of the 27th Asilomar Confer-
ence on Signals, Systems and Computers, 1:40-44, 1993.
[58] G. Pfander, H. Rauhut, and J. Tropp. The restricted isometry property for time-frequency struc-
tured random matrices. Prob. Theory Rel. Fields, to appear.
[59] M. Pfetsch and A. Tillmann. The Computational Complexity of the Restricted Isometry Property,
the Nullspace Property, and Related Concepts in Compressed Sensing. Preprint, 2012.
[60] H. Rauhut, J. Romberg, and J. Tropp. Restricted isometries for partial random circulant matrices.
Appl. Comput. Harmon. Anal., 32:242–254, 2012.
[61] H. Rauhut, K. Schnass, and P. Vandergheynst. Compressed sensing and redundant dictionaries.
IEEE Trans. Inform. Theory, 54:2210–2219, 2008.

Copyright line will be provided by the publisher


gamm header will be provided by the publisher 23

[62] C. J. Rozell and D. H. Johnson. Analysis of noise reduction in redundant expansions under dis-
tributed processing requirements. In Proceedings of the International Conference on Acoustics,
Speech, and Signal Processing (ICASSP), 185–188, Philadelphia, PA, 2005.
[63] T. Strohmer and R. W. Heath. Grassmannian frames with applications to coding and communica-
tion. Appl. Comput. Harmon. Anal., 14:257-275, 2004.
[64] J. A. Tropp. Greed is good: Algorithmic results for sparse approximation. IEEE Trans. Inform.
Theory, 50:2231–2242, 2004.
[65] X. Weiyu and B. Hassibi. Compressive Sensing over the Grassmann Manifold: a Unified Ana-
lytical Framework. In 46th Annual Allerton Conf. on Communication, Control, and Computing,
2008.

Copyright line will be provided by the publisher

You might also like