Compressed Sensing
Compressed Sensing
Compressed Sensing
David L. Donoho, Member, IEEE
Abstract—Suppose is an unknown vector in (a digital the important information about the signals/images—in effect,
image or signal); we plan to measure general linear functionals not acquiring that part of the data that would eventually just be
of and then reconstruct. If is known to be compressible “thrown away” by lossy compression. Moreover, the protocols
by transform coding with a known transform, and we recon-
struct via the nonlinear procedure defined here, the number of are nonadaptive and parallelizable; they do not require knowl-
measurements can be dramatically smaller than the size . edge of the signal/image to be acquired in advance—other than
Thus, certain natural classes of images with pixels need only knowledge that the data will be compressible—and do not at-
= ( 1 4 log5 2 ( )) nonadaptive nonpixel samples for tempt any “understanding” of the underlying object to guide
faithful recovery, as opposed to the usual pixel samples. an active or adaptive sensing strategy. The measurements made
More specifically, suppose has a sparse representation in
some orthonormal basis (e.g., wavelet, Fourier) or tight frame
in the compressed sensing protocol are holographic—thus, not
(e.g., curvelet, Gabor)—so the coefficients belong to an ball simple pixel samples—and must be processed nonlinearly.
for 0 1. The most important coefficients in that In specific applications, this principle might enable dra-
expansion allow reconstruction with 2 error ( 1 2 1 ). It is matically reduced measurement time, dramatically reduced
possible to design = ( log( )) nonadaptive measurements sampling rates, or reduced use of analog-to-digital converter
allowing reconstruction with accuracy comparable to that attain-
able with direct knowledge of the most important coefficients.
resources.
Moreover, a good approximation to those important coeffi-
cients is extracted from the measurements by solving a linear A. Transform Compression Background
program—Basis Pursuit in signal processing. The nonadaptive Our treatment is abstract and general, but depends on one spe-
measurements have the character of “random” linear combi-
cific assumption which is known to hold in many settings of
nations of basis/frame elements. Our results use the notions of
optimal recovery, of -widths, and information-based complexity. signal and image processing: the principle of transform sparsity.
We estimate the Gel’fand -widths of balls in high-dimensional We suppose that the object of interest is a vector , which
Euclidean space in the case 0 1, and give a criterion can be a signal or image with samples or pixels, and that there
identifying near-optimal subspaces for Gel’fand -widths. We is an orthonormal basis for which can
show that “most” subspaces are near-optimal, and show that
be, for example, an orthonormal wavelet basis, a Fourier basis,
convex optimization (Basis Pursuit) is a near-optimal way to
extract information derived from these near-optimal subspaces. or a local Fourier basis, depending on the application. (As ex-
plained later, the extension to tight frames such as curvelet or
Index Terms—Adaptive sampling, almost-spherical sections of
Banach spaces, Basis Pursuit, eigenvalues of random matrices, Gabor frames comes for free.) The object has transform coeffi-
Gel’fand -widths, information-based complexity, integrated cients , and these are assumed sparse in the sense
sensing and processing, minimum 1 -norm decomposition, op- that, for some and for some
timal recovery, Quotient-of-a-Subspace theorem, sparse solution
of linear equations.
(I.1)
I. INTRODUCTION
Such constraints are actually obeyed on natural classes of sig-
• Bump Algebra model for spectra. Here a spectrum (e.g., use “OR/IBC” as a generic label for work taking place in those
mass spectrum or magnetic resonance spectrum) is fields, admittedly being less than encyclopedic about various
modeled as digital samples of an underlying scholarly contributions.
function on the real line which is a superposition of We have a class of possible objects of interest, and are
so-called spectral lines of varying positions, amplitudes, interested in designing an information operator
and linewidths. Formally that samples pieces of information about , and an algorithm
that offers an approximate reconstruction of .
Here the information operator takes the form
operator must be simply measuring individual transform coef- provided by the wavelet basis; if , this is again symmetric
ficients? Actually, no: the information operator is measuring about the origin and orthosymmetric, while not being convex,
very complex “holographic” functionals which in some sense but still star-shaped.
mix together all the coefficients in a big soup. Compare (VI.1) To develop this geometric viewpoint further, we consider two
below. (Holography is a process where a three–dimensional notions of -width; see [5].
(3-D) image generates by interferometry a two–dimensional
Definition 1.1: The Gel’fand -width of with respect to
(2-D) transform. Each value in the 2-D transform domain is
the norm is defined as
influenced by each part of the whole 3-D object. The 3-D
object can later be reconstructed by interferometry from all
or even a part of the 2-D transform domain. Leaving aside
the specifics of 2-D/3-D and the process of interferometry, we where the infimum is over -dimensional linear subspaces of
perceive an analogy here, in which an object is transformed to a , and denotes the orthocomplement of with respect
compressed domain, and each compressed domain component to the standard Euclidean inner product.
is a combination of all parts of the original object.)
Another surprise is that, if we enlarged our class of informa- In words, we look for a subspace such that “trapping”
tion operators to allow adaptive ones, e.g., operators in which in that subspace causes to be small. Our interest in Gel’fand
certain measurements are made in response to earlier measure- -widths derives from an equivalence between optimal recovery
ments, we could scarcely do better. Define the minimax error for nonadaptive information and such -widths, well known in
under adaptive information allowing adaptive operators the case [5], and in the present setting extending as
follows.
Theorem 3: For and
where, for , each kernel is allowed to depend on the (I.4)
information gathered at previous stages .
Formally setting (I.5)
The asymptotic properties of the left-hand side have been de- F. Results
termined by Garnaev and Gluskin [9]. This follows major work Our paper develops two main types of results.
by Kashin [10], who developed a slightly weaker version of this
• Near-Optimal Information. We directly consider the
result in the course of determining the Kolmogorov -widths of
problem of near-optimal subspaces for Gel’fand -widths
Sobolev spaces. See the original papers, or Pinkus’s book [8]
of , and introduce three structural conditions
for more details.
(CS1–CS3) on an -by- matrix which imply that its
Theorem 4: (Kashin, Garnaev, and Gluskin (KGG)) For nullspace is near-optimal. We show that the vast majority
all and of -subspaces of are near-optimal, and random
sampling yields near-optimal information operators with
overwhelmingly high probability.
• Near-Optimal Algorithm. We study a simple nonlinear re-
construction algorithm: simply minimize the norm of
Theorem 1 now follows in the case by applying KGG
the coefficients subject to satisfying the measurements.
with the duality formula (I.6) and the equivalence formula (I.4).
This has been studied in the signal processing literature
The case of Theorem 1 does not allow use of duality
under the name Basis Pursuit; it can be computed by linear
and the whole range is approached differently in this
programming. We show that this method gives near-op-
paper.
timal results for all .
In short, we provide a large supply of near-optimal infor-
E. Mysteries …
mation operators and a near-optimal reconstruction procedure
Because of the indirect manner by which the KGG result im- based on linear programming, which, perhaps unexpectedly,
plies Theorem 1, we really do not learn much about the phenom- works even for the nonconvex case .
enon of interest in this way. The arguments of Kashin, Garnaev, For a taste of the type of result we obtain, consider a specific
and Gluskin show that there exist near-optimal -dimensional information/algorithm combination.
subspaces for the Kolmogorov widths; they arise as the null • CS Information. Let be an matrix generated by
spaces of certain matrices with entries which are known to randomly sampling the columns, with different columns
exist by counting the number of matrices lacking certain prop- independent and identically distributed (i.i.d.) random
erties, the total number of matrices with entries, and com- uniform on . With overwhelming probability for
paring. The interpretability of this approach is limited. large , has properties CS1–CS3 discussed in detail
The implicitness of the information operator is matched in Section II-A below; assume we have achieved such a
by the abstractness of the reconstruction algorithm. Based on favorable draw. Let be the basis matrix with
OR/IBC theory we know that the so-called central algorithm basis vector as the th column. The CS Information
is optimal. This “algorithm” asks us to consider, for given operator is the matrix .
information , the collection of all objects which • -minimization. To reconstruct from CS Information, we
could have given rise to the data solve the convex optimization problem
subject to
Defining now the center of a set In words, we look for the object having coefficients
with smallest norm that is consistent with the informa-
center tion .
To evaluate the quality of an information operator , set
the central algorithm is
center
To evaluate the quality of a combined algorithm/information
and it obeys, when the information is optimal, pair , set
for , . Moreover, the algorithm delivering usual the nullspace . We define the width of a set
the solution to is near-optimal: relative to an operator
subject to (II.1)
G. Potential Applications We will show for all large and the existence of by
matrices where
To see the potential implications, recall first the Bump Al-
gebra model for spectra. In this context, our result says that,
for a spectrometer based on the information operator in The-
orem 5, it is really only necessary to take with dependent at most on and the ratio .
measurements to get an accurate reconstruction of such spectra,
rather than the nominal measurements. However, they must A. Conditions CS1–CS3
then be processed nonlinearly.
Recall the Bounded Variation model for images. In that con- In the following, with let denote a sub-
text, a result paralleling Theorem 5 says that for a specialized matrix of obtained by selecting just the indicated columns of
imaging device based on a near-optimal information operator it . We let denote the range of in . Finally, we consider
is really only necessary to take measure- a family of quotient norms on ; with denoting the
ments to get an accurate reconstruction of images with pixels, norm on vectors indexed by
rather than the nominal measurements.
subject to
The calculations underlying these results will be given below,
along with a result showing that for cartoon-like images (which These describe the minimal -norm representation of achiev-
may model certain kinds of simple natural imagery, like brain able using only specified subsets of columns of .
scans), the number of measurements for an -pixel image is We define three conditions to impose on an matrix ,
only . indexed by strictly positive parameters , , and .
CS1: The minimal singular value of exceeds
H. Contents
uniformly in .
Section II introduces a set of conditions CS1–CS3 for CS2: On each subspace we have the inequality
near-optimality of an information operator. Section III considers
abstract near-optimal algorithms, and proves Theorems 1–3.
Section IV shows that solving the convex optimization problem
gives a near-optimal algorithm whenever . Sec- uniformly in .
tion V points out immediate extensions to weak- conditions CS3: On each subspace
and to tight frames. Section VI sketches potential implications
in image, signal, and array processing. Section VII, building on
work in [11], shows that conditions CS1–CS3 are satisfied for
“most” information operators. uniformly in .
Finally, in Section VIII, we note the ongoing work by two CS1 demands a certain quantitative degree of linear indepen-
groups (Gilbert et al. [12] and Candès et al. [13], [14]), which dence among all small groups of columns. CS2 says that linear
although not written in the -widths/OR/IBC tradition, imply combinations of small groups of columns give vectors that look
(as we explain), closely related results. much like random noise, at least as far as the comparison of
and norms is concerned. It will be implied by a geometric
fact: every slices through the ball in such a way that
II. INFORMATION
the resulting convex section is actually close to spherical. CS3
Consider information operators constructed as follows. With says that for every vector in some , the associated quotient
the orthogonal matrix whose columns are the basis elements norm is never dramatically smaller than the simple norm
, and with certain -by- matrices obeying conditions on .
specified below, we construct corresponding information oper- It turns out that matrices satisfying these conditions are ubiq-
ators . Everything will be completely transparent to uitous for large and when we choose the and properly.
the choice of orthogonal matrix and hence we will assume Of course, for any finite and , all norms are equivalent and
that is the identity throughout this section. almost any arbitrary matrix can trivially satisfy these conditions
In view of the relation between Gel’fand -widths and min- simply by taking very small and , very large. However,
imax errors, we may work with -widths. Let denote as the definition of “very small” and “very large” would have to
Authorized licensed use limited to: Anhui Normal University. Downloaded on July 31,2023 at 13:04:02 UTC from IEEE Xplore. Restrictions apply.
1294 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 4, APRIL 2006
depend on for this trivial argument to work. We claim some- A similar argument for approximation gives, in case
thing deeper is true: it is possible to choose and independent
of and of . (II.4)
Consider the set
Now . Hence, with , we have
. As and , we can
invoke CS3, getting
of all matrices having unit-normalized columns. On this
set, measure frequency of occurrence with the natural uniform
measure (the product measure, uniform on each factor ). On the other hand, again using and
Theorem 6: Let be a sequence of problem sizes with , invoke CS2, getting
, , and , , and . There
exist and depending only on and so that, for
each the proportion of all matrices satisfying
CS1–CS3 with parameters and eventually exceeds . Combining these with the above
We will discuss and prove this in Section VII. The proof will
show that the proportion of matrices not satisfying the condition
decays exponentially fast in .
For later use, we will leave the constants and implicit and
speak simply of CS matrices, meaning matrices that satisfy the with . Recalling ,
given conditions with values of parameters of the type described and invoking CS1 we have
by this theorem, namely, with and not depending on and
permitting the above ubiquity.
III. ALGORITHMS
Proof: Consider the optimization problem
Given an information operator , we must design a recon-
struction algorithm which delivers reconstructions compat-
subject to ible in quality with the estimates for the Gel’fand -widths. As
discussed in the Introduction, the optimal method in the OR/IBC
Our goal is to bound the value of framework is the so-called central algorithm, which unfortu-
nately, is typically not efficiently computable in our setting.
We now describe an alternate abstract approach, allowing us to
prove Theorem 1.
Choose so that . Let denote the indices of the
largest values in . Without loss of generality, A. Feasible-Point Methods
suppose coordinates are ordered so that comes first among
Another general abstract algorithm from the OR/IBC litera-
the entries, and partition . Clearly
ture is the so-called feasible-point method, which aims simply
to find any reconstruction compatible with the observed infor-
(II.2) mation and constraints.
As in the case of the central algorithm, we consider, for given
while, because each entry in is at least as big as any entry in information , the collection of all objects
, (I.2) gives which could have given rise to the information
(II.3)
Authorized licensed use limited to: Anhui Normal University. Downloaded on July 31,2023 at 13:04:02 UTC from IEEE Xplore. Restrictions apply.
DONOHO: COMPRESSED SENSING 1295
In the feasible-point method, we simply select any member of with where for given
, by whatever means. One can show, adapting standard and
OR/IBC arguments in [15], [6], [8] the following.
Lemma 3.1: Let where and
is an optimal information operator, and let be any element of
. Then for B. Proof of Theorem 3
(III.1) Before proceeding, it is convenient to prove Theorem 3. Note
that the case is well known in OR/IBC so we only need to
In short, any feasible point is within a factor two of optimal. give an argument for (though it happens that our argument
Proof: We first justify our claims for optimality of the cen- works for as well). The key point will be to apply the
tral algorithm, and then show that a feasible point is near to the -triangle inequality
central algorithm. Let again denote the result of the central
algorithm. Now
valid for ; this inequality is well known in interpola-
radius
tion theory [17] through Peetre and Sparr’s work, and is easy to
verify directly.
Suppose without loss of generality that there is an optimal
Now clearly, in the special case when is only known to lie in subspace , which is fixed and given in this proof. As we just
and is measured, the minimax error is saw
exactly radius . Since this error is achieved by the
radius
central algorithm for each such , the minimax error over all
is achieved by the central algorithm. This minimax error is
Now
radius radius
Now the feasible point obeys ; hence, so clearly . Now suppose without loss of generality that
and attain the radius bound, i.e., they satisfy
radius and, for center they satisfy
But the triangle inequality gives
radius
radius and so
Hence . However,
More generally, if the information operator is only near-
optimal, then the same argument gives
subject to
LP subject to (IV.1)
Secondly, the entropy numbers obey [20], [21]
has a solution , say, a vector in which can be partitioned
as ; then solves . The recon-
struction is . This linear program is typically consid-
At the same time, the combination of Theorems 7 and 6 shows ered computationally tractable. In fact, this problem has been
that studied in the signal analysis literature under the name Basis
Pursuit [26]; in that work, very large-scale underdetermined
problems—e.g., with and —were solved
successfully using interior-point optimization methods.
Applying now the Feasible Point method, we have As far as performance goes, we already know that this pro-
cedure is near-optimal in case ; from (III.2) we have the
following.
Corollary 4.1: Suppose that is an information operator
with immediate extensions to for all .
achieving, for some
We conclude that
D. Proof of Theorem 2
Now is an opportune time to prove Theorem 2. We note that
in the case of , this is known [22]–[25]. The argument is In particular, we have a universal algorithm for dealing with
the same for , and we simply repeat it. Suppose that any class —i.e., any , any , any . First, apply a
, and consider the adaptively constructed subspace ac- near-optimal information operator; second, reconstruct by Basis
cording to whatever algorithm is in force. When the algorithm Pursuit. The result obeys
terminates, we have an -dimensional information vector and
a subspace consisting of objects which would all give that
information vector. For all objects in , the adaptive informa-
tion therefore turns out the same. Now the minimax error asso- for a constant depending at most on . The
ciated with that information is exactly radius ; inequality can be put to use as follows. Fix . Suppose
but this cannot be smaller than the unknown object is known to be highly compressible,
say obeying the a priori bound , . Let
radius . For any such object, rather than making
measurements, we only need to make
measurements, and our reconstruction obeys
The result follows by comparing with .
B. Relation Between and Minimization This measures the fraction of norm which can be concen-
trated on a certain subset for a vector in the nullspace of . This
The general OR/IBC theory would suggest that to handle concentration cannot be large if is small.
cases where , we would need to solve the nonconvex
optimization problem , which seems intractable. However, Lemma 4.1: Suppose that satisfies CS1–CS3, with con-
in the current situation at least, a small miracle happens: solving stants and . There is a constant depending on the so
is again near-optimal. To understand this, we first take a that if satisfies
small detour, examining the relation between and the extreme
case of the spaces. Let us define then
subject to
Proof: This is a variation on the argument for Theorem 7.
where is just the number of nonzeros in . Again, since Let . Assume without loss of generality that is the
the work of Peetre and Sparr [16], the importance of and the most concentrated subset of cardinality , and
relation with for is well understood; see [17] for that the columns of are numbered so that ;
more detail. partition . We again consider , and have
Ordinarily, solving such a problem involving the norm re- . We again invoke CS2–CS3, getting
quires combinatorial optimization; one enumerates all sparse
subsets of searching for one which allows a solu-
tion . However, when has a sparse solution,
will find it. We invoke CS1, getting
In words, although the system of equations is massively Proof of Theorem 8: Suppose that and is sup-
underdetermined, minimization and sparse solution coin- ported on a subset .
cide—when the result is sufficiently sparse. We first show that if , is the only minimizer
There is by now an extensive literature exhibiting results on of . Suppose that is a solution to , obeying
equivalence of and minimization [27]–[34]. In the early
literature on this subject, equivalence was found under condi- Then where . We have
tions involving sparsity constraints allowing nonzeros.
While it may seem surprising that any results of this kind are
Invoking the definition of twice
possible, the sparsity constraint is, ultimately,
disappointingly small. A major breakthrough was the contribu-
tion of Candès, Romberg, and Tao [13] which studied the ma- Now gives and we have
trices built by taking rows at random from an by Fourier
matrix and gave an order bound, showing that i.e., .
dramatically weaker sparsity conditions were needed than the Now recall the constant of Lemma 4.1. Define so
results known previously. In [11], it was shown that that and . Lemma 4.1 shows that
for ’nearly all’ by matrices with , equiv- implies .
alence held for nonzeros, . The above re-
sult says effectively that for ’nearly all’ by matrices with C. Near-Optimality of Basis Pursuit for
, equivalence held up to nonzeros,
We now return to the claimed near-optimality of Basis Pursuit
where .
throughout the range .
Our argument, in parallel with [11], shows that the nullspace
has a very special structure for obeying the conditions Theorem 9: Suppose that satisifies CS1–CS3 with con-
in question. When is sparse, the only element in a given affine stants and . There is so that a solu-
subspace which can have small norm is itself. tion to a problem instance of with obeys
To prove Theorem 8, we first need a lemma about the non-
sparsity of elements in the nullspace of . Let
and, for a given vector , let denote the mutilated The proof requires an stability lemma, showing the sta-
vector with entries . Define the concentration bility of minimization under small perturbations as measured
in norm. For and stability lemmas, see [33]–[35]; how-
ever, note that those lemmas do not suffice for our needs in this
proof.
Authorized licensed use limited to: Anhui Normal University. Downloaded on July 31,2023 at 13:04:02 UTC from IEEE Xplore. Restrictions apply.
1298 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 4, APRIL 2006
while
and the reconstruction formula . In fact, Theorems
7 and 9 only need the Parseval relation in the proof. Hence, the
As solves same results hold without change when the relation between
and involves a tight frame. In particular, if is an matrix
satisfying CS1–CS3, then defines a near-optimal
and of course information operator on , and solution of the optimization
problem
Hence,
B. Weak Balls
Our main results so far have been stated for spaces, but the
and (IV.2) follows.
proofs hold for weak balls as well . The weak ball
Proof of Theorem 9: We use the same general framework of radius consists of vectors whose decreasing rearrange-
as in Theorem 7. Let where . Let be the ments obey
solution to , and set .
Let as in Lemma 4.1 and set . Let
index the largest amplitude entries in . From
and (II.4) we have Conversely, for a given , the smallest for which these in-
equalities all hold is defined to be the norm: . The
“weak” moniker derives from . Weak con-
and Lemma 4.1 provides straints have the following key property: if denotes a muti-
lated version of the vector with all except the largest items
set to zero, then the inequality
Applying Lemma 4.2
(IV.3) (V.1)
The vector lies in and has . is valid for and , with . In fact,
Hence, Theorems 7 and 9 only needed (V.1) in the proof, together with
(implicitly) . Hence, we can state results for
spaces defined using only weak- norms, and the
proofs apply without change.
We conclude by homogeneity that
has expansion coefficients in a basis or frame that obey a partic- The compressed sensing scheme takes a total of samples
ular or weak embedding, and then apply the above abstract of resumé coefficients and samples associ-
theory. ated with detail coefficients, for a total pieces of
information. It achieves error comparable to classical sampling
A. Bump Algebra based on samples. Thus, it needs dramatically fewer sam-
Consider the class of functions which ples for comparable accuracy: roughly speaking, only the cube
are restrictions to the unit interval of functions belonging to the root of the number of samples of linear sampling.
Bump Algebra [2], with bump norm . This was To achieve this dramatic reduction in sampling, we need an
mentioned in the Introduction, which observed that the wavelet information operator based on some satisfying CS1–CS3. The
coefficients at level obey where de- underlying measurement kernels will be of the form
pends only on the wavelet used. Here and later we use standard
wavelet analysis notations as in [36], [37], [2]. (VI.1)
We consider two ways of approximating functions in . In
the classic linear scheme, we fix a “finest scale” and mea- where the collection is simply an enumeration of the
sure the resumé coefficients where wavelets , with and .
, with a smooth function integrating to .
Think of these as point samples at scale after applying B. Images of Bounded Variation
an antialiasing filter. We reconstruct by
giving an approximation error We consider now the model with images of Bounded Varia-
tion from the Introduction. Let denote the class of func-
tions with domain , having total variation at
most [38], and bounded in absolute value by as
with depending only on the chosen wavelet. There are
well. In the Introduction, it was mentioned that the wavelet co-
coefficients associated with the unit interval, and
efficients at level obey where depends only
so the approximation error obeys
on the wavelet used. It is also true that .
We again consider two ways of approximating functions in .
The classic linear scheme uses a 2-D version of the scheme we
In the compressed sensing scheme, we need also wavelets have already discussed. We again fix a “finest scale” and
where is an oscillating function with measure the resumé coefficients where now
mean zero. We pick a coarsest scale . We measure the is a pair of integers , . indexing
resumé coefficients —there are of these—and then let position. We use the Haar scaling function
denote an enumeration of the detail wavelet coeffi-
cients . The dimension
of is . The norm satisfies
We reconstruct by giving an approxima-
tion error
optimal algorithm of minimization to the resulting informa- This is no better than the performance of linear sampling for
tion, getting the error estimate the Bounded Variation case, despite the piecewise character
of ; the possible discontinuities in are responsible for the
inability of linear sampling to improve its performance over
with independent of . The overall reconstruction compared to Bounded Variation.
In the compressed sensing scheme, we pick a coarsest scale
. We measure the resumé coefficients in a
smooth wavelet expansion—there are of these—and then
let denote a concatentation of the finer scale
has error curvelet coefficients . The dimension of
is , with due to overcompleteness of
curvelets. The weak “norm” obeys
Indeed, note that the probability measure on induced by The third and final idea is that bounds for individual subsets
sampling columns i.i.d. uniform on is exactly the natural can control simultaneous behavior over all . This is expressed
uniform measure on . Hence, Theorem 6 follows immedi- as follows.
ately from Theorem 10.
Lemma 7.3: Suppose we have events all obeying, for
In effect matrices satisfying the CS conditions are so ubiq-
some fixed and
uitous that it is reasonable to generate them by sampling at
random from a uniform probability distribution.
The proof of Theorem 10 is conducted over Sections VII-
A–C; it proceeds by studying events , , where for each with . Pick with
CS1 Holds , etc. It will be shown that for parameters and with . Then for all
and sufficiently large
for some
then defining and , we have
with
Since, when occurs, our random draw has produced a ma- Our main goal of this subsection, Lemma 7.1, now follows
trix obeying CS1–CS3 with parameters and , this proves by combining these three ideas.
Theorem 10. The proof actually shows that for some , It remains only to prove Lemma 7.3. Let
, ; the convergence is exponen-
tially fast. with
Then, for each , for sufficiently small so we get . Taking as given, we get the
desired conclusion.
We recall a standard implication of so-called Vapnik– If is empty, then the process terminates, and set .
Cervonenkis theory [46] Termination must occur at stage . At termination
while also
and
Hence, the total number of sign patterns generated by operators We adapt the arguments deployed there. We define bounds
obeys and for and , of the form
so that
where denotes the least-squares projector
. In effect, identify the indices where
exceeds half the forbidden level , and “kill” those
indices. Put
Continue this process, producing , , etc., with stage-de-
pendent thresholds successively closer to . Set
and note that this depends quite weakly on . Recall that the
and, putting , event is defined in terms of and . On the event ,
. Lemma 7.1 implicitly defined a quan-
tity lowerbounding the minimum eigenvalue of
Authorized licensed use limited to: Anhui Normal University. Downloaded on July 31,2023 at 13:04:02 UTC from IEEE Xplore. Restrictions apply.
1304 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 4, APRIL 2006
and
VIII. CONCLUSION
That paper also stated two simple large deviations bounds. A. Summary
Lemma 7.8: Let be i.i.d. , , We have described an abstract framework for compressed
sensing of objects which can be represented as vectors .
We assume the object of interest is a priori compressible so
that for a known basis or frame and .
and Starting from an by matrix with satisfying condi-
tions CS1–CS3, and with the matrix of an orthonormal basis
or tight frame underlying , we define the information
operator . Starting from the -tuple of measured
information , we reconstruct an approximation to
Applying this, we note that the event by solving
subject to
[11] D. L. Donoho, “For most large underdetermined systems of linear [30] R. Gribonval and M. Nielsen, “Sparse representations in unions of
equations, the minimal ` -norm solution is also the sparsest solution,” bases,” IEEE Trans Inf Theory, vol. 49, no. 12, pp. 3320–3325, Dec.
Commun. Pure Appl. Math., to be published. 2003.
[12] A. C. Gilbert, S. Guha, P. Indyk, S. Muthukrishnan, and M. Strauss, [31] J. J. Fuchs, “On sparse representation in arbitrary redundant bases,”
“Near-optimal sparse fourier representations via sampling,” in Proc 34th IEEE Trans. Inf. Theory, vol. 50, no. 6, pp. 1341–1344, Jun. 2002.
ACM Symp. Theory of Computing, Montréal, QC, Canada, May 2002, [32] J. A. Tropp, “Greed is good: Algorithmic results for sparse approxima-
pp. 152–161. tion,” IEEE Trans Inf. Theory, vol. 50, no. 10, pp. 2231–2242, Oct. 2004.
[13] E. J. Candès, J. Romberg, and T. Tao, “Robust uncertainty principles: [33] , “Just relax: Convex programming methods for identifying sparse
Exact signal reconstruction from highly incomplete frequency informa- signals in noise,” IEEE Trans Inf. Theory, vol. 52, no. 3, pp. 1030–1051,
tion.,” IEEE Trans. Inf. Theory, to be published. Mar. 2006.
[14] E. J. Candès, “Robust Uncertainty Principles and Signal Recovery,” [34] D. L. Donoho, M. Elad, and V. Temlyakov, “Stable recovery of sparse
presented at the 2nd Int. Conf. Computational Harmonic Anaysis, overcomplete representations in the presence of noise,” IEEE Trans. Inf.
Nashville, TN, May 2004. Theory, vol. 52, no. 1, pp. 6–18, Jan. 2006.
[15] C. A. Micchelli and T. J. Rivlin, “A survey of optimal recovery,” in Op- [35] D. L. Donoho, “For most underdetermined systems of linear equations,
timal Estimation in Approximation Theory, C. A. Micchelli and T. J. the minimal ` -norm near-solution approximates the sparsest near-so-
Rivlin, Eds: Plenum, 1977, pp. 1–54. lution,” Commun. Pure Appl. Math., to be published.
[16] J. Peetre and G. Sparr, “Interpolation of normed abelian groups,” Ann. [36] I. C. Daubechies, Ten Lectures on Wavelets. Philadelphia, PA: SIAM,
Math. Pure Appl., ser. 4, vol. 92, pp. 217–262, 1972. 1992.
[17] J. Bergh and J. Löfström, Interpolation Spaces. An Introduc- [37] S. Mallat, A Wavelet Tour of Signal Processing. San Diego, CA: Aca-
tion. Berlin, Germany: Springer-Verlag, 1976. demic, 1998.
[18] B. Carl, “Entropy numbers s-numbers, and eigenvalue problems,” J. [38] A. Cohen, R. DeVore, P. Petrushev, and H. Xu, “Nonlinear approxima-
Funct. Anal., vol. 41, pp. 290–306, 1981. tion and the space BV (R ),” Amer. J. Math., vol. 121, pp. 587–628,
[19] G. Pisier, The Volume of Convex Bodies and Banach Space Geom- 1999.
etry. Cambridge, U.K.: Cambridge Univ. Press, 1989. [39] E. J. Candès and D. L. Donoho, “Curvelets—A surprisingly effective
[20] C. Schütt, “Entropy numbers of diagonal operators between symmetric nonadaptive representation for objects with edges,” in Curves and Sur-
Banach spaces,” J. Approx. Theory, vol. 40, pp. 121–128, 1984. faces, C. Rabut, A. Cohen, and L. L. Schumaker, Eds. Nashville, TN:
[21] T. Kuhn, “A lower estimate for entropy numbers,” J. Approx. Theory, Vanderbilt Univ. Press, 2000.
vol. 110, pp. 120–124, 2001. [40] , “New tight frames of curvelets and optimal representations of ob-
[22] S. Gal and C. Micchelli, “Optimal sequential and nonsequential proce- jects with piecewise C singularities,” Comm. Pure Appl. Math., vol.
dures for evaluating a functional,” Appl. Anal., vol. 10, pp. 105–120, LVII, pp. 219–266, 2004.
1980. [41] S. J. Szarek, “Spaces with large distances to ` and random matrices,”
[23] A. A. Melkman and C. A. Micchelli, “Optimal estimation of linear oper- Amer. J. Math., vol. 112, pp. 819–842, 1990.
ators from inaccurate data,” SIAM J. Numer. Anal., vol. 16, pp. 87–105, [42] , “Condition numbers of random matrices,” J. Complexity, vol. 7,
1979. pp. 131–149, 1991.
[24] M. A. Kon and E. Novak, “The adaption problem for approximating [43] A. Dvoretsky, “Some results on convex bodies and banach spaces,” in
linear operators,” Bull. Amer. Math. Soc., vol. 23, pp. 159–165, 1990. Proc. Symp. Linear Spaces, Jerusalem, Israel, 1961, pp. 123–160.
[25] E. Novak, “On the power of adaption,” J. Complexity, vol. 12, pp. [44] T. Figiel, J. Lindenstrauss, and V. D. Milman, “The dimension of almost-
199–237, 1996. spherical sections of convex bodies,” Acta Math., vol. 139, pp. 53–94,
[26] S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decomposition by 1977.
basis pursuit,” SIAM J. Sci Comp., vol. 20, no. 1, pp. 33–61, 1999. [45] V. D. Milman and G. Schechtman, Asymptotic Theory of Finite-Dimen-
[27] D. L. Donoho and X. Huo, “Uncertainty principles and ideal atomic de- sional Normed Spaces (Leecture Notes in Mathematics). Berlin, Ger-
composition,” IEEE Trans. Inf. Theory, vol. 47, no. 7, pp. 2845–62, Nov. many: Springer-Verlag, 1986, vol. 1200.
2001. [46] D. Pollard, Empirical Processes: Theory and Applications. Hayward,
[28] M. Elad and A. M. Bruckstein, “A generalized uncertainty principle and CA: Inst. Math. Statist., vol. 2, NSF-CBMS Regional Conference Series
sparse representations in pairs of bases,” IEEE Trans. Inf. Theory, vol. in Probability and Statistics.
49, no. 9, pp. 2558–2567, Sep. 2002. [47] E. J. Candès and T. Tao, “Near-optimal signal recovery from random
[29] D. L. Donoho and M. Elad, “Optimally sparse representation from over- projections: Universal encoding strategies,” Applied and Computational
complete dictionaries via ` norm minimization,” Proc. Natl. Acad. Sci. Mathematics, Calif. Inst. Technol., Tech. Rep., 2004.
USA, vol. 100, no. 5, pp. 2197–2002, Mar. 2002.
Authorized licensed use limited to: Anhui Normal University. Downloaded on July 31,2023 at 13:04:02 UTC from IEEE Xplore. Restrictions apply.