Extensions of Compressed Sensing: Yaakov Tsaig David L. Donoho October 22, 2004
Extensions of Compressed Sensing: Yaakov Tsaig David L. Donoho October 22, 2004
Yaakov Tsaig
David L. Donoho
Abstract
We study the notion of Compressed Sensing (CS) as put forward in [14] and related work
[20, 3, 4]. The basic idea behind CS is that a signal or image, unknown but supposed to
be compressible by a known transform, (eg. wavelet or Fourier), can be subjected to fewer
measurements than the nominal number of pixels, and yet be accurately reconstructed.
The samples are nonadaptive and measure ‘random’ linear combinations of the transform
coefficients. Approximate reconstruction is obtained by solving for the transform coefficients
consistent with measured data and having the smallest possible `1 norm.
We perform a series of numerical experiments which validate in general terms the basic
idea proposed in [14, 3, 5], in the favorable case where the transform coefficients are sparse in
the strong sense that the vast majority are zero. We then consider a range of less-favorable
cases, in which the object has all coefficients nonzero, but the coefficients obey an `p bound,
for some p ∈ (0, 1]. These experiments show that the basic inequalities behind the CS method
seem to involve reasonable constants.
We next consider synthetic examples modelling problems in spectroscopy and image pro-
cessing, and note that reconstructions from CS are often visually “noisy” . We post-process
using translation-invariant de-noising, and find the visual appearance considerably improved.
We also consider a multiscale deployment of compressed sensing, in which various scales
are segregated and CS applied separately to each; this gives much better quality reconstruc-
tions than a literal deployment of the CS methodology.
We also report that several workable families of ‘random’ linear combinations all be-
have equivalently, including random spherical, random signs, partial Fourier and partial
Hadamard.
These results show that, when appropriately deployed in a favorable setting, the CS
framework is able to save significantly over traditional sampling, and there are many useful
extensions of the basic idea.
Key Words and Phrases: Basis Pursuit. Underdetermined Systems of Linear Equations.
Linear Programming. Random Matrix Theory.
Acknowledgements. Partial support from NSF DMS 00-77261, and 01-40698 (FRG) and
ONR. Thanks to Michael Saunders for optimization advice, and Emmanuel Candès for discus-
sions of his own related work with J. Romberg and T. Tao. Raphy Coifman asked us insistently
about the question of the constants in (1.2) which inspired this report.
1 Introduction
In the modern multimedia world, ‘everyone’ knows that all humanly-intelligible data are highly
compressible. In exploiting this fact, the dominant approach is to first sample the data, and then
eliminate redundancy using various compression schemes. This raises the question: why is it
1
necessary to sample the data in a pedantic way and then later to compress it? Can’t one directly
acquire a compressed representation? Clearly, if this were possible, there would be implications
in a range of different fields, extending from faster data acquisition, to higher effective sampling
rates, and lower communications burden.
Several recent papers [20, 3, 14, 5], have shown that, under various assumptions, it may
be possible to directly acquire a form of compressed representation. In this paper, we put such
ideas to the test by making a series of empirical studies of the effectiveness of compressed sensing
schemes.
Call the result x̂1,n . In words, x̂1,n is, among all objects generating the same measured data, the
one having transform coefficients with the smallest `1 norm. In [14] it was mentioned that this
reconstruction procedure can be implemented by linear programming, and so may be considered
computationally tractable. Also the needed matrices Φ satisfying CS1-CS3 were shown to be
constructible by random sampling from a uniform distribution on the columns of Φ.
In short the approach involves linear, nonadaptive measurement, followed by nonlinear ap-
proximate reconstruction.
The paper [14] proved error bounds showing that despite the apparent undersampling (n <
m), good accuracy reconstruction is possible for compressible objects. Such bounds take the
form
kx̂1,n − x0 k2 ≤ Cp · R · (n/ log(m))1/2−1/p , n, m > n0 . (1.2)
As p < 2, this bound guarantees that reconstruction is accurate for large n, with a very weak
dependence on m. Such bounds were interpreted in [14] to say that n measurements with
n = O(N log(m)) are just as good as knowing the N biggest transform coefficients. Examples
were sketched for model problems caricaturing imaging and spectroscopy.
In related prior work, classical literature in approximation theory [23, 19, 27] (developing
the theory of Gel’fand n-widths) deals with closely related problems from an even more abstract
2
viewpoint; see the discussion in [14]. More recently, Gilbert et al. [20] considered n-by-m
matrices Φ made of n special rows out of the m-by-m Fourier matrix, while Candès, Romberg
and Tao considered matrices Φ made of n randomly chosen rows from the Fourier matrix.
Candès, Romberg, and Tao [3] considered also the use of `1 minimization, just as here while
Gilbert et al. [20] considered a different nonlinear procedure. Candès has informed us of work
in preparation considering also random matrices.
Finally we note that at some level of generality, we are discussing here the idea of getting
sparse solutions to underdetermined systems of equations using `1 methods, which forms part
of a now-extensive body of work: [6, 9, 11, 12, 13, 16, 17, 21, 22, 25, 28, 29]. We expect that
many of the authors of the just-cited papers will be contributing to this special issue
• How large do n and m have to be? Is (1.2) meaningless for practical problem sizes?
• How large is the constant C? Even if (1.2) applies, perhaps the constant is miserable?
• When the object is not perfectly reconstructed, what sorts of errors occur? Perhaps the
error term, though controlled in size by (1.2), has an objectionable structure?
• How should the CS framework be applied to realistic signals? In [14] models of spectroscopy
and imaging were considered; for such models, it was proposed to deploy CS in a hybrid
strategy, with coarse-scale measurements obtained by classical linear sampling, and the
bulk obtained at fine scales by the CS strategy. Are such ideas derived for the purpose of
simplifying mathematical proofs, actually helpful in a concrete setting?
• What happens if there is noise in the measurements? Perhaps the framework falls apart
if there’s any noise in the observations – even just the small errors of floating point repre-
sentation.
Supposing that such questions do not have devastating answers, there are also natural ques-
tions about extending the method by extending the kind of matrices Φ which are in use:
• CS matrices Φ which can rapidly applied. This is connected with the previous question.
For applying the linear programming formulation of (1.1) it is very convenient [6] that Ψ
and Φ each can be rapidly applied – along with their transposes.
3
1.3 Experiments...
In this paper we approach the above questions through computational experiments. We consider
each question, describe experiments to study it, create synthetic signals, and interpret the results
of our experiments.
In Section 2 we consider the application of CS to signals which have only a small number
of truly nonzero coefficients, getting empirical results which mirror the theory in [14]. Section 3
considers signals which have all coefficients nonzero, but which still have sparse coefficients as
measured by `p norms, 0 < p ≤ 1. The results in this setting are in some sense ‘noisier’ than
they were in Section 2. To alleviate that, we consider an extension of CS by post-processing to
remove the ‘noise’ in CS reconstructions. Section 4 considers a different attack on ‘noise’ based
on a noise-aware reconstruction method.
In Section 5, we consider an extension of CS mentioned in [14]: a hybrid scheme, using
conventional linear sampling and reconstruction for coarse-scale information and compressed
sampling on fine scale information. The ‘noise’ in reconstructions can be dramatically lower for
a given number of samples. Section 6 pushes this approach farther, deploying CS in a multiscale
fashion. Different scales are segregated and CS is applied separately to each one. Again, the
‘noise’ can be dramatically lower.
In Section 7, we explore the freedom available in the choice of CS-matrices, describing several
different matrix ensembles which seem to give equivalent results. While the approaches in [14]
involved random matrices, we have found that several other ensembles work well, including the
partial Fourier ensembles of [20, 3, 5].
In a final section we summarize our results.
4
(a) k = 20
1.5
|| x1,n − x0 ||2
1
0.5
0
0 50 100 150 200 250 300 350 400 450 500
(b) k = 50
1.5
|| x1,n − x0 ||2
0.5
0
0 50 100 150 200 250 300 350 400 450 500
(c) k = 100
1.5
|| x1,n − x0 ||2
0.5
0
0 50 100 150 200 250 300 350 400 450 500
Figure 1: Error of reconstruction versus number of samples for (a) k = 20 nonzeros; (b) k = 50;
(c) k = 100.
−2
0 500 1000 1500 2000
−6
−8
−10
Figure 2: (a) Signal Blocks; (b) its expansion in a Haar wavelet basis. There are 77 nonzero
coefficients.
5
(a) CS Reconstruction of Blocks, n = 340
6
4
2
0
−2
−4
0 200 400 600 800 1000 1200 1400 1600 1800 2000
(b) CS Reconstruction of Blocks, n = 256
6
4
2
0
−2
−4
0 200 400 600 800 1000 1200 1400 1600 1800 2000
(c) T−I denoising of reconstruction with n = 256
6
4
2
0
−2
−4
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Figure 3: CS reconstructions of Blocks from (a) n = 340 and (b) n = 256 measurements. (c)
Translation-invariant denoising of (b).
typically have exact zeros anywhere in the transform. Hence, results of the type just shown,
while instructive, are of limited practical interest.
Figure 4 panel (a) shows such a spiky object, and panel (b) shows the reconstruction. Note that
the largest few spikes are well-recovered, but not the many small ones.
As a more intuitive example, we considered the object Bumps from the Wavelab package [1],
with m = 2048. As Figure 5 panel (a) shows, the object is a superposition of smooth bumps.
Panel (b) shows that the large coefficients at each scale happen near the bump locations, and
panel (c) shows the decreasing rearrangement of the wavelet coefficients on a log scale. We
applied the CS framework with n = 256 and got the result in panel (a) of Figure 6. Panel
6
−3
x 10 (a) Original Signal, m = 1024
20
15
10
−5
100 200 300 400 500 600 700 800 900 1000
−3
x 10 (b) CS Reconstruction, n = 128
20
15
10
−5
0 100 200 300 400 500 600 700 800 900 1000
Figure 4: CS reconstruction of a signal with controlled `p norm, p = 1/2. (a) Original signal,
m = 1024; (b) CS reconstruction with n = 128 samples.
(b) shows the result with n = 512. Clearly both results are ‘noisy’, with, understandably, the
‘noisier’ one coming at the lower sampling rate.
Indeed, the results in Figures 3 and 6 show that ‘noise’ sometimes appears in reconstructions
as the number of measurements decreases (even though the data are not noisy in our examples).
To alleviate this phenomenon, we considered the test cases shown earlier, namely Blocks and
Bumps, and applied translation-invariant wavelet de-noising [7] to the reconstructed ‘noisy’
signals. Results are shown in panel (c) of Figure 3 and panels (b) and (d) of Figure 6. At least
visually, there is a great deal of improvement.
4 Noise-Aware Reconstruction
So far we have not allowed for the possibility of measurement noise, digitization errors, etc. We
remark that the theory allows for accommodating to a small amount of noise already, through
the `1 -stability property proved in [14].
To make further accommodation for noise, our primary adjustment would be to use Basis
Pursuit de-noising (BPDN) rather than Basis Pursuit. For a given noise level ² > 0, define the
optimization problem
This can be written as a linearly constrained convex quadratic program, and is considered
practical to solve. In [6] BPDN was successfully used in cases where n < m and both are quite
large: n = 8192 and m = 262144. Our proposal for dealing with noisy data is simply to measure
yn = ΦΨT x + noise and then use (L1,² ) with an appropriate noise tolerance ² > 0.
Extending the theory in [14] suggests that we should expect bounds of the form:
7
(a) Signal Bumps, m = 2048
6
−2
0 200 400 600 800 1000 1200 1400 1600 1800 2000
(b) Wavelet analysis
−4
−6
−8
−10
0
10
−10
10
−20
10
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Figure 5: (a) Signal Bumps, m = 2048; (b) its wavelet coefficients; (c) Decay of coefficient
magnitudes on a log scale.
4 4
2 2
0 0
−2 −2
0 500 1000 1500 2000 0 500 1000 1500 2000
4 4
2 2
0 0
−2 −2
0 500 1000 1500 2000 0 500 1000 1500 2000
Figure 6: CS reconstruction of Bumps from (a) n = 256 and (c) n = 512 measurements.
Translation-invariant denoising in (b) and (d).
8
(a) Noisy Signal (b) Noisy wavelet coefficients
20
−4
10 −6
0 −8
−10
−10
0 500 1000 1500 2000 0 0.2 0.4 0.6 0.8 1
10 −6
0 −8
−10
−10
0 500 1000 1500 2000 0 0.2 0.4 0.6 0.8 1
10 −6
0 −8
−10
−10
0 500 1000 1500 2000 0 0.2 0.4 0.6 0.8 1
Figure 7: CSDN reconstruction of Blocks, m = 2048, n = 512. Signal and reconstructions are
shown on the left panel, with corresponding wavelet expansions on the right panel.
noise to them. The noise was rescaled to enforce a specific noise level ² = 0.2. We applied
the compressed sensing scheme with denoising (CSDN) to the noisy wavelet expansions. For
comparison, we also attempted regular CS reconstruction. Results are shown in Figures 7 and
8. We used signal length m = 2048, and attempted reconstructions with n = 512. Indeed, the
reconstruction achieved with CSDN is far superior to the CS reconstruction in both cases.
5 Two-Gender Hybrid CS
In [14] model spectroscopy and imaging problems were considered from a theoretical perspective.
CS was deployed differently there than so far in this paper – in particular, it was not proposed
that CS alone ‘carry all the load’. In that deployment, CS was applied to measuring only fine
scale properties of the signal, while ordinary linear measurement and reconstruction was used
to obtain the coarse-scale properties of the signal.
In more detail, the proposal was as follows; we spell out the ideas for dimension 1 only.
Expand the object x0 in the wavelet basis
X j1 X
X
x0 = βj0 ,k φj0 ,k + αj,k ψj,k
k j=j0 k
where j0 is some specified coarse scale, j1 is the finest scale, φj0 ,k are male wavelets at coarse
scale and ψj,k are fine scale female wavelets. Let α = (αj,k : j0 ≤ j ≤ j1 , 0 ≤ k < 2j ) denote the
grouping together of all wavelet coefficients, and let β = (βj0 ,k : 0 ≤ k < 2j0 ) denote the male
coefficients. Now consider a scheme where different strategies are used for the two genders α
and β. For the male coarse-scale coefficients, we simply take direct measurements
β̂ = (hφj0 ,k , x0 i : 0 ≤ k < 2j0 ).
9
(a) Noisy Signal (b) Noisy wavelet coefficients
20
−4
10 −6
0 −8
−10
−10
0 500 1000 1500 2000 0 0.2 0.4 0.6 0.8 1
10 −6
0 −8
−10
−10
0 500 1000 1500 2000 0 0.2 0.4 0.6 0.8 1
10 −6
0 −8
−10
−10
0 500 1000 1500 2000 0 0.2 0.4 0.6 0.8 1
Figure 8: CSDN reconstruction of Bumps, m = 2048, n = 512. Signal and reconstructions are
shown on the left panel, with corresponding wavelet expansions on the right panel.
For the female fine-scale coefficients, we apply the CS scheme. Let m = 2j1 − 2j0 , and let the
2j1 × m matrix Ψ have, for columns, the vectors ψj,k in some standard order. Given an n by
m CS matrix Φ, define Ξ = ΦΨT , so that, in some sense, the columns of of Ξ are ‘noisy’ linear
combinations of columns of Ψ. Now make n measurements
y = Ξx0 .
To reconstruct from these observations, define Ω = ΞΨ and consider the basis-pursuit optimiza-
tion problem
(BP ) min kak1 subject to yn = Ωa, (5.1)
a
a minor relabelling of the (L1 ) problem. Call the answer α̂. The overall reconstruction is
X j1 X
X
x̂hy = β̂j0 ,k φj0 ,k + α̂j,k ψj,k ,
k j=j0 k
10
(a) Linear reconstruction, n = 512
6
4
2
0
−2
0 2000 4000 6000 8000 10000 12000 14000 16000
(b) Hybrid CS reconstruction, n = 248
6
4
2
0
−2
0 2000 4000 6000 8000 10000 12000 14000 16000
(c) Multiscale CS reconstruction, n = 208
6
4
2
0
−2
0 2000 4000 6000 8000 10000 12000 14000 16000
Figure 9: Reconstruction of Signal Blocks, m = 16384. (a) Linear reconstruction from 512 sam-
ples, kx̂lin − x0 k2 = 0.091; (b) Hybrid CS reconstruction from 248 samples (Gender-Segregated),
kx̂hy − x0 k2 = 0.091; (c) Multiscale CS reconstruction from 208 samples, kx̂ms − x0 k2 = 0.091.
hybrid compressed samples (32 male samples, 176 compressed female samples). The accuracy
is evidently comparable.
Now consider Figure 10. Panel (a) shows a bumpy signal of original length m = 16384
reconstructed from n = 1024 linear samples, and panel (b) shows reconstruction from nhy = 640
hybrid compressed samples (32 male samples, 608 compressed female samples). Again the
reconstruction accuracy is comparable.
In [14] the idea of gender-segregated sampling was extended to higher dimensions in con-
sidering the class of images of bounded variation. The ideas are a straightforward extension
of the 1-D case, and we shall not repeat them here. We investigate the performance of hybrid
CS sampling applied to image data in the following experiment. Figure 11 shows a Mondrian
image of size 1024 × 1024, so that m = 220 . Figure 12 has reconstruction results with n = 4096
linear samples, and also with nhy = 1152 hybrid compressed samples (128 male samples, 1024
compressed female samples). The reconstruction accuracy is evidently comparable.
11
(a) Linear reconstruction, n = 1024
6
4
2
0
−2
0 2000 4000 6000 8000 10000 12000 14000 16000
(b) Hybrid CS reconstruction, n = 640
6
4
2
0
−2
0 2000 4000 6000 8000 10000 12000 14000 16000
(c) Multiscale CS reconstruction, n = 544
6
4
2
0
−2
0 2000 4000 6000 8000 10000 12000 14000 16000
Figure 10: Reconstruction of Signal Bumps, m = 16384. (a) Linear reconstruction from 1024
samples, kx̂lin − x0 k2 = 0.0404; (b) Hybrid CS reconstruction from 640 samples (Gender-
Segregated), kx̂hy − x0 k2 = 0.0411; (c) Multiscale CS reconstruction from 544 samples,
kx̂ms − x0 k2 = 0.0425.
Rectangles, 1024x1024
100
200
300
400
500
600
700
800
900
1000
200 400 600 800 1000
12
(a) Linear reconstruction, n = 4096 (b) Hybrid CS reconstruction, n = 1152 (c) Multiscale CS reconstruction, n = 1048
Figure 12: Reconstruction of Mondrian image, m = 220 . (a) Linear reconstruction from 4096
samples, kx̂lin − x0 k2 = 0.227; (b) Hybrid CS reconstruction from 1152 samples (Gender-
Segregated), kx̂hy − x0 k2 = 0.228; (c) Multiscale CS reconstruction from 1048 samples,
kx̂ms − x0 k2 = 0.236.
Consider now a multilevel stratification of the object in question, partitioning the coefficient
vector as
[(βj0 ,· ), (αj0 ,· ), (αj0 +1,· ), . . . , (αj1 −1,· )]
We then apply ordinary linear sampling to measure the coefficients (βj0 ,· ) directly, and then
separately apply compressed sensing scale-by-scale, sampling data yj about the coefficients (αj,· )
at level j using an nj × 2j CS matrix Φj . We obtain thereby a total of
samples, compared to
m = 2j0 + 2j0 + 2j0 +1 + · + 2j1 −1 = 2j1
coefficients in total. To obtain a reconstruction, we then solve the sequence of problems
X jX
1 −1 X
(j)
x̂ms = βj0 ,k φj0 ,k + α̂k ψj,k .
k j=j0 k
(Of course, variations are possible; we might group together several coarse scales j0 , j0 +
1, . . . , j0 + ` to get a larger value of m.)
For an example of results obtained using Multiscale CS, consider Figure 9(c). It shows the
signal Blocks reconstructed from nms = 208 compressed samples (32 coarse-scale samples, 48
compressed samples at each scale). Indeed, the reconstruction accuracy is comparable to the
linear and gender-segregated results. Similarly, Figure 10(c) has multiscale reconstruction of
Bumps from nms = 544 compressed samples (32 coarse-scale samples, 128 compressed samples
at each scale). Again the reconstruction accuracy is comparable to that achieved by the other
13
Test Image, 5122 pixels, m = 742400 Curvelet coeffs Linear reconstruction, n = 480256, ||E||2 = 0.038159
100 100
200 200
300 300
400 400
500 500
100 200 300 400 500 100 200 300 400 500
CS reconstruction, n = 96459, ||E|| = 0.024227 Best N−term reconstruction, n = 96459, ||E|| = 0.0017277
2 2
100 100
200 200
300 300
400 400
500 500
100 200 300 400 500 100 200 300 400 500
Figure 13: (a) Disk image; (b) Reconstruction from 480256 Linear Samples, kx̂lin − x0 k2 =
0.0381; (c) Reconstruction from 96459 Multiscale Compressed Samples using the Curvelets
Frame, kx̂ms − x0 k2 = 0.0242; (d) Reconstruction from best N -term approximation with
N = 96459, kx̂N −term − x0 k2 = 0.00173.
methods. Finally, Figure 12(c) has multiscale CS reconstruction of the Mondrian image from
1048 compressed samples (64 coarse-scale samples, 328 compressed samples at each scale).
Consider now an example working with a frame rather than an orthobasis, in this case the
Curvelets frame [2]. Theory supporting the possible benefits of using this frame for cartoon-like
images was developed in [14].
Like the wavelet basis, there is a scale parameter j which specifies the size of the curvelet
frame element; we considered a deployment of multiscale compressed sensing which used different
nj at each level. In Figure 13 we give an example deriving from a very simple black-and-white
image depicting a disk. For comparison, we also include the result of best N -term approximation
based on (hypothetical) direct measurement of the N most important curvelet coefficients.
To help the reader gain more insight into the level-by-level performance, we compare in
Figure 14 the disk coefficients and the reconstructed ones. Clearly, there is additional noise in
the reconstruction. Nonetheless, this noise is not evident in the overall CS reconstruction.
14
Scale 7
1
0.5
−0.5
−1
0 0.5 1 1.5 2 2.5 3 3.5 4
5
x 10
n = 36044, ||E|| = 4.5768
1
0.5
−0.5
−1
0 0.5 1 1.5 2 2.5 3 3.5 4
5
x 10
Figure 14: Exact Curvelet coefficients of Disk at scale 7 (top), and reconstructed coefficients in
CS framework (bottom)
√
• Random Signs Ensemble. Here Φij has entries ±1/ n with signs chosen independently
and both signs equally likely.
• Uniform Spherical Ensemble. The columns of Φ are iid random uniform on the sphere
S n−1 .
• Partial Fourier Ensemble. We select at random n rows out of the m × m Fourier matrix,
getting an n-by-m Partial Fourier matrix.
• Candès, Romberg, and Tao [3] recently have generated a great deal of excitement by
showing several interesting properties of Random Partial Fourier matrices and making
claims about their possible use in Compressed Sensing [4].
• Partial Hadamard matrices are known to generate near optimal subspaces in certain special
cases for the related problem of determining Kolmogorov n-widths of the octahedron b1,m
with respect to `∞m norm; see Pinkus’ book [26].
We also remark that there are numerous very interesting practical applications where Partial
Fourier and Partial Hadamard matrices are of direct interest, for example in Fourier transform
imaging and Hadamard transform spectroscopy.
15
(a) Uniform Spherical Ensemble (b) Random Signs Ensemble
|| x1,n − x0 ||2
|| x1,n − x0 ||2
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
0 200 400 600 800 1000 0 200 400 600 800 1000
n n
|| x1,n − x0 ||2
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
0 200 400 600 800 1000 0 200 400 600 800 1000
n n
Figure 15: Error versus number of measurements n: (a) Uniform Spherical Ensemble (b) Random
Signs Ensemble (c) Partial Hadamard Ensemble; (d) Partial Fourier Ensemble.
In Figure 15, we compare the theoretical predictions in [14] with actual errors in the different
matrix ensembles just defined. We do so as follows. For each ensemble, we consider an object de-
fined as in Section 3, i.e. an m-vector with unit p-norm, whose k-th largest amplitude coefficient
|θ|(k) obeys (3.1). A typical example is shown in Figure 4(a). We consider families of experiments
where n, the number of measurements, varies, and for each n, we apply the CS framework and
measure the `2 reconstruction error. (In fact, we average over a number of instances). Figure 15
depicts error versus sampling n for the different ensembles, with p = 1/2. To validate theoretical
predictions, we also plot the error bound (1.2). For the purpose of comparison, we set the bound
constant Cp in Eq. (1.2) to C1/2 = 8. As we can see, the simulation results are in line with the
predicted error bound. Further, we observe that the different ensembles show similar behavior.
This suggests that all such ensembles are equally good in practice.
8 Conclusions
In this paper, we have reviewed the basic Compressed Sensing framework, in which we gather
n pieces of information about an object which, nominally, has m degrees of freedom, n < m.
When the object is compressible in the sense that the `p norm of its transform coefficients is
well-controlled, then by measuring n essentially ‘random’ linear functionals of the object and
reconstructing by `1 minimization on the transform coefficients, we get, in theory, an accurate
reconstruction; see (1.2).
We raised a number of questions about its application. The first concerns the strength of
the theoretical bound (1.2). This bound contains unspecified constants, and so the practical
relevance of the framework depends heavily on the precise values of the constants. We con-
ducted a number of numerical experiments to test whether the constants in the basic theoretical
inequality are small enough to have practical implications, even at moderate n and m, m in the
16
low thousands, n in the few hundreds. We concluded that this is so.
We note however, that the method comes off much more impressively when the underlying
object has relatively few nonzeros than it does when the object has a large number of small
nonzeros (as in the `p model). In effect, when the number of samples exceeds the number of
nonzeros by a factor of 3 or so, the method often performs exceptionally well. Unfortunately,
signals made of only a few nonzero terms are rather special.
At the same time, the work we have done is consistent with a value of C1/2 = 10 or smaller,
so the constants in (1.2) do not seem catastrophically large.
We conducted numerical experiments on objects Blocks and Bumps, caricaturing spectra and
scan lines in images; for such objects, whose visual appearance can be gauged, we found that
below a certain threshold on n, the CS reconstructions look visually noisy – even though there
is no noise in the observations.
Our main effort in this article has been to go beyond these initial observations by considering
several extensions of the CS framework. These extensions include,
• Postprocessing noise removal. In order to defeat the appearance of visual noise in the
CS reconstructions, we applied post-processing to the CS reconstruction by translation
invariant de-noising with a level-dependent threshold. We found the appearance of visual
noise dramatically reduced (for objects Blocks and Bumps).
• Allowing noise in the measurements. The basic CS framework does not allow for noise
in the observations. To handle the case where noise is present, we suggested the use of
a noise-tolerant `1 minimization, essentially using ‘Basis-Pursuit Denoising’ in place of
Basis Pursuit, on which CS is based. We presented examples showing that this can in our
examples substantially reduce the effects of noise on the reconstruction.
• Multiscale Deployment. More significantly, we considered the use of CS not as the only way
to gather information about an object, but as a tool to be used in conjunction with classical
linear sampling at coarse scales. We considered a hybrid method, already discussed in [14]
from a theoretical perspective, in which very coarse-scale linear sampling was combined
with CS applied at all finer scales. We also introduced a fully multiscale method in which
CS was applied at each scale separately of all others, and linear sampling only at the coarser
scale. Both approaches can significantly outperform linear sampling, giving comparable
accuracy of reconstruction with far fewer samples in our examples. This tends to support
the theoretical claim in [14] that compressed sampling, deployed in a Hybrid fashion, can,
for certain kinds of objects, produce an order-of-magnitude improvement over classical
linear sampling.
• Alternate CS Matrices. Perhaps of most significance is our empirical finding that several
apparently different random matrix ensembles all perform similarly when used in the CS
framework. Such a finding, if true in general, is important for algorithmic reasons. To
apply interior-point methods to solve the linear program (1.1) requires many matrix vector
products Φu and ΦT v for strategically chosen vectors u and v [6]. It is much faster to apply
matrices Φ and ΦT defined by the Partial Fourier and Partial Hadamard ensembles, where
the cost will be O(m log(m)) flops. Clearly, for large m and n one would much prefer to use
such matrices over the Random Uniform Ensemble or the Random Signs ensemble, which
cost O(nm) flops to apply. It appears that we have many choices of matrices yielding
adequate performance in the CS framework, consistent with the Theorems in [14], and
that among those matrices are some which can be applied rapidly.
17
In short, it appears that Compressed Sensing is a fairly promising strategy for sampling
certain kinds of signals characterized by large numbers of nominal samples and yet high com-
pressibility in terms of transforms like the wavelet transform or Fourier transform. Our practical
results already show gains by factors of 2, 4, or even more in moderate-size problems sizes, and
these can be enhanced by deployment in a multiscale fashion and by applying de-noising to
reconstructions.
Of course the theoretical implications of the bound (1.2) are even stronger than what we
observe here, in the sense that at least for n and m large, really dramatic benefits ought to be
possible for certain classes of compressible objects. We know that several groups are actively
exploring applications of these ideas, first of course being Candès, Romberg, and Tao, who are
actively pursuing the implications of [3, 4, 5], which have proved so inspiring. In addition,
R.R. Coifman at Yale has expressed interest in experimental work on this topic, and announced
his intention to conduct experiments in the CS framework. Clearly we can expect far more
experimentation and theoretical development of such ideas in the near future, and the interested
reader should find it profitable to pursue all the above-mentioned threads.
References
[1] J. Buckheit and D. L. Donoho (1995) WaveLab and reproducible research, in A. Antoniadis,
Editor, Wavelets and Statistics, Springer, 1995.
[2] E. J. Candès and DL Donoho (2004) New tight frames of curvelets and optimal represen-
tations of objects with piecewise C 2 singularities. Comm. Pure and Applied Mathematics
LVII 219-266.
[3] E.J. Candès, J. Romberg and T. Tao. (2004) Robust Uncertainty Principles: Exact Signal
Reconstruction from Highly Incomplete Frequency Information. Manuscript.
[4] Candès, E.J. and Romberg, J. (2004) Presentations at Second International Conference on
Computational Harmonic Anaysis, Nashville Tenn. May 2004.
[5] Candès, E.J. and Tao, T. (2004) Estimates for Fourier Minors, with Applications.
Manuscript.
[6] Chen, S., Donoho, D.L., and Saunders, M.A. (1999) Atomic Decomposition by Basis Pur-
suit. SIAM J. Sci Comp., 20, 1, 33-61.
[7] Coifman, R.R. and Donoho, D.L. (1995) Translation-invariant de-noising. In Wavelets and
Statistics, Antoniadis, A. and Oppenheim, G. (Eds.), Lect. Notes Statist., 103, pp. 125-150,
New York: Springer-Verlag.
[8] R.R. Coifman, Y. Meyer, S. Quake, and M.V. Wickerhauser (1990) Signal Processing and
Compression with Wavelet Packets. in Wavelets and Their Applications, J.S. Byrnes, J. L.
Byrnes, K. A. Hargreaves and K. Berry, eds. 1994,
[9] Donoho, D.L. and Huo, Xiaoming (2001) Uncertainty Principles and Ideal Atomic Decom-
position. IEEE Trans. Info. Thry. 47 (no.7), Nov. 2001, pp. 2845-62.
[10] Donoho, D.L. and Elad, Michael (2002) Optimally Sparse Representation from Overcom-
plete Dictionaries via `1 norm minimization. Proc. Natl. Acad. Sci. USA March 4, 2003 100
5, 2197-2002.
18
[11] Donoho, D., Elad, M., and Temlyakov, V. (2004) Stable Recovery of Sparse Over-
complete Representations in the Presence of Noise. Submitted. URL: http://www-
stat.stanford.edu/˜donoho/Reports/2004.
[12] Donoho, D.L. (2004) For most large underdetermined systems of linear equations, the min-
imal `1 solution is also the sparsest solution. Manuscript. Submitted. URL: http://www-
stat.stanford.edu/˜donoho/Reports/2004
[13] Donoho, D.L. (2004) For most underdetermined systems of linear equations, the minimal `1 -
norm near-solution approximates the sparsest near-solution. Manuscript. Submitted. URL:
http://www-stat.stanford.edu/˜donoho/Reports/2004
[14] Donoho, D.L. (2004) Compressed Sensing. Manuscript. Submitted. URL: http://www-
stat.stanford.edu/˜donoho/Reports/2004
[15] A. Dvoretsky (1961) Some results on convex bodies and Banach Spaces. Proc. Symp. on
Linear Spaces. Jerusalem, 123-160.
[16] M. Elad and A.M. Bruckstein (2002) A generalized uncertainty principle and sparse repre-
sentations in pairs of bases. IEEE Trans. Info. Thry. 49 2558-2567.
[17] J.J. Fuchs (2002) On Sparse Representations in Arbitrary Redundant Bases. IEEE Trans.
Info. Thry 50 (no.6), June 2004, pp. 1341-44.
[18] T. Figiel, J. Lindenstrauss and V.D. Milman (1977) The dimension of almost-spherical
sections of convex bodies. Acta Math. 139 53-94.
[19] Garnaev, A.Y. and Gluskin, E.D. (1984) On widths of the Euclidean Ball. Soviet Mathe-
matics – Doklady 30 (in English) 200-203.
[21] R. Gribonval and M. Nielsen. Sparse Representations in Unions of Bases. IEEE Trans. Info.
Thry 49 (no.12), Dec. 2003, pp. 1320-25.
[22] R. Gribonval and M. Nielsen. Highly Sparse Representations from Dictionaries are Unique
and Independent of the Sparseness Measure. Manuscript.
[23] Boris S. Kashin (1977) Diameters of certain finite-dimensional sets in classes of smooth
functions. Izv. Akad. Nauk SSSR, Ser. Mat. 41 (2) 334-351.
[24] S. Mallat, Z. Zhang, (1993). Matching Pursuits with Time-Frequency Dictionaries. IEEE
Trans. Sig. Proc., 41 (no.12), pp. 3397-3415.
[25] B.K. Natarajan (1995) Sparse Approximate Solutions to Linear Systems. SIAM J. Comput.
24: 227-234.
[27] Pinkus, A. (1986) n-widths and Optimal Recovery in Approximation Theory, Proceeding
of Symposia in Applied Mathematics, 36, Carl de Boor, Editor. American Mathematical
Society, Providence, RI.
19
[28] J.A. Tropp (2003) Greed is Good: Algorithmic Results for Sparse Approximation. IEEE
Trans Info. Thry. 50 (no.11), Oct. 2004, pp. 2231-42.
[29] J.A. Tropp (2004) Just Relax: Convex programming methods for Subset Sleection and
Sparse Approximation. Manuscript.
20