Greedy Signal Space Recovery Algorithm W
Greedy Signal Space Recovery Algorithm W
Abstract—Compressive Sensing (CS) is a new paradigm for constitute measurements y with noisy perturbations e. The
the efficient acquisition of signals that have sparse representation process is simply described as
in a certain domain. Traditionally, CS has provided numerous
arXiv:1905.02433v1 [cs.IT] 7 May 2019
OMPRESSIVE sensing [1], [2] is a recently developed Design of computationally efficient sparse signal recovery
C and fast growing field of research as a novel sampling
paradigm. Suppose that x is a length-n signal. It is said to
algorithms based on the RIP recovery conditions for the l1 -
norm relaxation has extensively studied in previous works.
be k-sparse (or compressible) if x can be well approximated Linear programming [1] and other convex optimization algo-
using only k ≪ n coefficients under the some transform rithms [4], [5] have been proposed to solve the problem (3).
The most common approaches include the basis pursuit (BP),
x = ψa, (1) interior-point (IP) [6], homotopy [7], and gradient projection
for sparse representation (GPSR) algorithm [8]. However,
where ψ is the sparsifying basis and a is the coefficient vector it has been shown that the sparse signal recovery problem
that has at most k nonzero entries. can be solved with stability and uniform guarantees with
For a k-sparse signal x, CS samples its m (m < n) polynomially bounded computation complexity.
random linear projections towards irrelative directions, which As a result, Greedy Pursuit (GP) [9] algorithms have also
been widely studied. The predominant idea of GP algorithms
This work was supported in part by the National Basic Researh Program of is to estimate the nonzero elements of a coefficient vector
China (973 program) under Grant 2014CB340404 and in part by the National
Science Foundation of China under Grant 61471267. iteratively. Matching Pursuit (MP) [10] is the earliest greedy
J. Zhu is with the College of Electronics and Information Engineering, pursuit algorithm. The Orthogonal Matching Pursuit (OMP)
Tongji University, Shanghai 201804, China (e-mail: [email protected]). [11] is a well-known greedy pursuit algorithm as the im-
S. Zhao is with the Key Laboratory of Embedded System and Service
Computing, Ministry of Education, Tongji University, Shanghai 201804, China proved version of MP. Several other advanced GP algorithms
(e-mail: [email protected]). have been proposed, such as the Regularized OMP (ROMP)
Q. Shi is with the School of Software Engineering, Tongji University, [12], Compressive Sampling Matching Pursuit (CoSaMP) [13],
Shanghai 201804, China (e-mail: [email protected]).
G. R. Arce is with the Department of Electrical and Computer Engineering, and Subspace Pursuit (SP) [14]. Generally speaking, the GP
University of Delaware, Newark, DE 19716 USA (e-mail: [email protected]). algorithms have received considerable attention due to low
2
holds for all a satisfying kak0 ≤ k. the condition of D- The proof follows that of Appendix B.
RIP ensures norm preservation of all signals having a sparse with e denoting the base of the natural logarithm, then with
representation x = Da. Thus, the condition of RIP is consid- probability 1 − α, A will satisfy the condition of D-RIP of
ered to be a stronger requirement when D is an overcomplete order k with the constant δ.
dictionary. As noted above, the random matrix approach is somewhat
There are numerous methods to design matrices that satisfy useful to help us solve signal recovery problems. In this
the condition of D-RIP. To the best of our knowledge, the paper, we will further focus on the random matrices in the
commonly used random matrices satisfy the condition of RIP development of our theory.
or D-RIP with high probability.
More speciffic, we consider matrices constructed as fol- C. Bound for the Tail Energy
lows: we generate a matrix A ∈ Rm×n by selecting the
entries A[M, N ] as independent and identically distributed In this section, we focus on the tail energy since it plays
random variables. We impose two conditions on the random a important role in our analysis of the convergence of the
distribution. First, we require that the distribution is centered algorithm (see Section III-E below for details). In particular,
idd 1 we give some useful expansions to demonstrate the bound
and normalized such that A[M, N ] ∼ N (0, m ). Second, we
2 condition of the tail energy.
require that the random variable kAxk2 in [3] has expected Def inition 4: Suppose that x is a k-sparse signal in the
2
value kxk2 ; that is, overcomplete dictionary D domain and e is an additional noise
(where kek2 ≤ ε), then we have
2 2
E kAxk2 = kxk2 . (17)
kx − xk k1
Generally speaking, any distribution, which includes the ẽ = kx − xk k2 + √ + kek2 , (22)
k
Gaussian and uniform distribution, with bounded support is
subgaussian. where xk is the best approximation of x. The algorithm makes
The key property of subgaussian random variables that will significant progress at each iteration where the recovery error
be of used in this paper is that any matrix A ∈ Rm×n which is large relative to the tail energy. In noisy case, the tail energy
for a fixed vector (signal) x satisfies as the quantity measure the baseline recovery error.
Assume that p is a number in the interval (0, 1). Let
2 2 2
Pr kAxk2 − kxk2 ≥ ε kxk2 ≤ 4e−c0 (ε)m . (18) the signal x is p-compressible with magnitude R when the
components of x such that |x1 | ≥ |x2 | ≥ · · · ≥ |xn | obey a a
It implies that the matrix A will satisfy the condition of power law decay such that
D-RIP with high probability as long as m is at least on the
order of k log(d/k). From this, the probability is taken over all |xi | ≤ R ∗ i−1/p , ∀x = 1, 2, · · · n. (23)
draws of A and the constant c0 (ε) rely both on the particualr
According to (23), kxk1 ≤ R ∗ (1 + log n) when p = 1 and
subgaussian distribution and the range of ε. Perhaps the most
p-compressible signal is almost sparse when p ≈ 0. In general,
important for our purpose is the following lemma.
the p-compressible signals apply to approximate sparse signals
Lemma 1: Let χ denote any k-dimensiinal of Rn . Fix
such that
δ, α ∈ (0, 1). Suppose that A is a m × n random matrix
with i.i.d entries chosen from a distribution satisfying (18), kx − xk k1 ≤ D1 ∗ R ∗ k 1−1/p ,
(24)
we obtain the minimal number of measurements required for kx − xk k2 ≤ D2 ∗ R ∗ k 1/2−1/p ,
exact recovery
where the constants D1 = (1/p − 1)−1 and D2 =
2k log(42/δ) + log(4/α) (2/p − 1)−1/2 . Note that (24) provides upper bounds on the
m = O( √ ), (19)
c0 (δ/ 2) two different norms of the recovery error kx − xk k. Combin-
then with probability exceeding 1 − α, ing this result with (22), the tail energy in a p-compressible
√ √ signal is upper bounded by
1 − δkxk2 ≤ kAxk2 ≤ 1 + δkxk2 (20)
ẽ ≤ 2D1 ∗ R ∗ k 1/2−1/p + kek2 . (25)
for all x ∈ χ.
Proof: See Appendix B. When the parameter p is enough small, the most term in the
When D is an overcomplete dictionary, one can use Lem- right hand (25) decays rapidly as the sparsity level k increases.
mma 1 to go beyond a single k-dimensional subspace to in-
stead considering all possible subspace spanned by k columns D. Recovery of Approximately Sparse-Dictionary Signals from
of D, thereby establishing the condition of D-RIP for A. Then, Incomplete Measurements
we have the following Lemma. Consider that signals have a sparse representation in an
Lemma 2: Let D be an overcomplete dictionary whose overcomplete dictionary D, we theoretically provide a guar-
dimension is n×d and fix δ, α ∈ (0, 1). we obtain the minimal antee for exact recovery of sparse-dictionary signals. Analo-
number of measurements required for exact recovery gously to the guarantees of GP algorithms, the proof relies on
2k log(42ed/δk + log(4/α) iteration invariant which indicates that the recovery error is
m = O( √ ) (21) mostly determined by the number of iterations.
c0 (δ/ 2)
6
Before stating the main result for the algorithm (Theorem Each iteration of the algorithm reduces the recovery error by
3), we first state the following Theorem . a constant factor, while adding an additional noise component.
T heorem 2 Assume that A satisfy the condition of D-RIP By taking a sufficient number of iterations l, the most term
with the constant δ4k < 0.1. Let SD (x, k) be the near optimal 2−l kxk2 can be made as small as possible, and ultimately the
projections in (15) and xl+1 be the approximation after l + 1 recovery error is proportional to the noise level in the noisy
c2
iterations. if (1 + c1 )(1 − (1+β) 2 ) < 1, the upper bound of measurements. If the accurate SD is provided, the upper bound
recovery error after l + 1 iterations is given by of recovery error in (29) also applied to those of commonly
used results.
x − xl+1 2
≤ η1 kek2 , (26)
where β is an arbitrary constant, and η1 is a constant which
E. Recovery of Approximately Arbitrary Signals from Incom-
depends on c1 , c2 and β. Inspired by the precious work in the
plete Measurements
signal space setting, the conditions of Theorem 2 on the near
optimal projections holds in cases where D is not unitary and As shown in the proof of Theorem 3, in the case where the
especially in cases where D is highly overcomplete/redundant signls have a sparse representation in D, smaller values of c1
that can’t satisfy the traditional condition of RIP. To the and c2 result in a more accurate recovery and it is possible to
best of our knowledge, the classical GP algorithms are used achieve accurate recovery as accurate as desired by choosing
to calculate the projections. Thus, we provide a stronger small enough of kek2 . However, this is not the case that signals
convergence for the algorithm even when the dictionary is don’t exactly have a sparse representation in D, that is, if
highly overcomplete in the following Theorem.
y = A(Dak ) + A(x − Dak ) + e = A(Dak ) + ê. (30)
T heorem 3: Let A be a sensing matrix satisfying the
condition of D-RIP of order 4k or a coefficient vector a such Notcie that the term ê = A(x − Dak ) + e can be viewed as
that x = Da. Then, the signal estimation xl+1 after l + 1 the noise in the noisy measurements of the k-sparse signal Dak
iterations of the algorithm satisfies with kak k0 ≤ k. In fact, the ”new” noise ê bounds maximum
x − xl+1 ≤ C 1 x − xl + C2 kek2 achievable accuracy. For the sake of illustration, the condition
2 2
of Lemma 4 still holds. Further, we state 2 Theorems and
with
r 2 Lemmas in this section, which can be considered as the
1 + δ4k (27) extensions of Theorem 3 and its Lemma (Lemma 4) to this
C1 = ((2 + λ1 )δ4k + λ1 )(2 + λ2 )
1 − δ4k case.
(2 + λ2 )((2 + λ1 )(1 + δ4k ) + 2) First, we state the following lemma, which can be consid-
C2 = ( √ ).
1 − δ4k ered as a generalization to Lemma 4.
Proof: The proof follows that of Theorem II.1 [36] Lemma 5: For the general CS model y = A(Dak ) + ê in
Notice that constants C1 and C2 that depended on the (30), if δ4k < 0.1, the upper bound of recovery error is given
isometry constant δ4k and on the approximation parameters by
λ1 and λ2 . Further, an immediate Lemma of Theorem 3 is the x − xl+1 ≤ 0.5 x − xl + kx − Dak k2
2 2
following.
Lemma 3: Assume that the conditions of Theorem + 7.5kA(x − Dak )k2 + 7.5kek2 .
l Then after m a constant number of iterations l + 1 =
3. Particularly, (31)
log(kxk2 /kek2 ) l −l
log(1/C1 ) it holds that x−x ≤ 2 kDak k2 + kx − Dak k2
2
Using this Theorem to bound the right hand of (31), we Using this Lemma that the term kAD(a − ak )k2 bounds
derive the recovery error in the coefficient space, we derive
x − xl+1 2 ≤ 0.5 x − xl 2 + 7.5kek2 x − xl+1 2 ≤ 0.5 x − xl 2 + 7.5kek2 + kx − Dak k2
√
7.5 1 + δk
p 1
+ 7.5 1 + δk (ka − ak k2 + √ ka − ak k2 ).
p
+ (7.5 1 + δk + 1)kx − Dak k2 + √ kx − Dak k1 .
k k
Particularly, Particularly,
x − xl 2 ≤ 2−l kDak k2 + 15kek2 x − xl 2 ≤ 2−l kDak k2 + 15kek2 + kx − Dak k2
√ p 1
p 15 1 + δk + 15 1 + δk (ka − ak k2 + √ ka − ak k2 ),
+ (15 1 + δk + 1)kx − Dak k2 + √ kx − Dak k1 . k
k (38)
(33)
where ak is the best k-sparse approximation of a. If ak we
Denote chose is arbitrarily compressible, then a = ak , which implies
1 that the upper bound of (38) is reasonably small.
M (x) := inf (kx − Dak k2 + √ kx − Dak k1 ), (34)
ak :kak k0 ≤k k
F. Computation Complexity of the Algorithm
is the model mismatch quantity (for any x ∈ Rn ). Notice that
(32) and (34) have very similar form even in the case where In this section, we further obtain the following result re-
D is not a overcomplete dictionary. Combining this result in garding the convergence speed of the algorithm.
(33), we have Recall that x̂ = xl+1 as an output of the algorithm after
p l + 1 iterations. Given a postive parameter η, the algorithm
x − xl+1 2 ≤ 0.5 x − xl 2 + 7.5kek2 + 8.5 1 + δk M (x). produces a signal estimation x̂ after at most O(log kxk2 /η)
Particularly, iterations such that
p
x − xl ≤ 2−l kDak k2 + 15kek2 + 16 1 + δk M (x). kx − x̂k2 = O(η + kek2 ) = O(max {η, kek2 } . (39)
2
(35) The cost of one iteration of the algorithm is dominated
Notice that the quantity bounds the above recovery error. If by the cost of steps 1 an 6 of the algorithm as Table I is
the quantity we chose is enough large, then the signal is not a presented. The first step is to obtain the proxy u = A ∗ r
k-sparse signal or a compressible signal such that x 6= Dak , and the signal estimation x̃. The next step is to calculate
which implies that signals still don’t exactly have a sparse the support approximation SD efficiently with the classical
representation in D. GP algorithms which includes OMP, ROMP, CoSaMP and
Similarly, we derive an upper bound of recovery error which SP are used to estimate SD . The running time of these
is nearly relative to the tail energy as stated in the following algorithms over an n × d dictionary D is O(knd) or O(nd).
Theorem. Therefore, the overall running time of these GP algorithms
T heorem 5: Let A be a sensing matrix satisfying the is O(knd log kxk2 /η) or O(nd log kxk2 /η). Notice that the
condition of D-RIP. Assume that δ4k < 0.1. Given the dictionary D is overcomplete. For sparse signal recovery,
assumption that modifications to (33) for the general CS these running time are in line with advanced bounds for
model, the upper bound of recovery error is given by the algorithm, which implies that the algorithm has linear
convergence shown in Fig. 3.
x − xl+1 2
≤ 0.5 x − xl 2
+ 10ẽ. Interestingly, we now turn to the case where the number
Particularly, (36) of measurements required is calculated through reducing the
l −l approximation recovery error if there exists the R-SNR. Thus,
x−x 2
≤ 2 kDak k2 + 20ẽ. given a sparse-dictionary x with kxk2 ≤ 2R, the upper bound
Proof: See appendix F. of SNR is given by
After l + 1 iterations, the term 2−l kDak k2 can be made kxk2 kxk2
enough small such that lim 2−l kDak k2 = 0 and the recovery R − SN R = 10 log( ) = 10 log( )
l
kx − x̂k2 kx − Dâk2
error depends only on the tail energy, which implies that the kxk2 kxk2
algorithm make significant progress per iteration in this case. ≤ 10 log( ) = 10 log( )
kx − xk k2 ẽ
Recall that the term kA(x − Dak )k2 bounds the recovery 2R 1 1 1
error in (31). The assumption that modifications to (31) implies ≤ 10 log( ) = log( − 1) + ( − ) log k,
2D1 · R · k 1/2−1/p p p 2
that there exist an upper bound of the term kA(x − Dak )k2 (40)
in the coefficient space as stated in the following Lemma.
Lemma 6: If AD satisfies the condition of D-RIP with where x̂ is the approximation of x. The number of iterations
the constant δ4k < 0.1, then using the extension of (32) yields required is O(log k). Therefore, if the fixed R-SNR can
be guaranteed, the overall running time of the algorithm is
kA(x − xk )k2 = kAD(a − ak )k2 O(nd log kxk2 /η · SN R) in this case, which further implies
p ka − ak k1 (37) that the computation complexity of the algorithm is nearly
≤ 1 + δk (ka − ak k2 + √ ). linear in the signal length.
k
8
V. C ONCLUSION
(b) In this paper, we present support estimation techniques for a
greedy sparse-dictionary signal recovery algorithm. Using this
method, we propose the signal space subspace pursuit algo-
rithm based on signal space method and establish theoretical
signal recovery guarantees. We observe that the accuracy of the
algorithm in this setting depends on the signal structure, even
though their conventional recovery guarantees are independent
of the signal structure. We analyze the behavior of the signal
space method when the dictionary is highly overcomplete
and thus does not satisfy typical conditions like the RIP
or incoherence. Under specific assumptions on the signal
structure, we demonstrate that the signal space method is used
to optimally approximate projections. Thus, our analysis pro-
vides theoretical backing to explain the observed phenomena.
According to the simulation results and through comparison
with several other commonly used algorithms, in the noise-free
and noisy cases, the algorithm achieves outstanding recovery
Fig. 4: Performance comparison of perfect signal recovery performance.
frequency for the signals having a k = 10 sparse representation
in a renormalized orthogonal dictionary D: (a) With noise-free APPENDIX
measurements. (b) With noisy measurements.
A. Proof of condition of the Definition 3: if the dictionary is
orthonormal, the value of the localization factor is one
To complete the proof, we introduce the
P following Theorem.
As can be seen in Fig. 5(a), we compare the performance
T heorem A.1: Suppose that x ∈ k , then
of eight different algorithms for the case where the nonzero
entries of a are well separated. Fig. 5(a) shows that SSSP(LP) kxk1 √
√ ≤ kxk2 ≤ kkxk∞ . (42)
performs better than other algorithms when using a classical k
algorithm like LP for the near-optimal projection SD (x, k).
This is because that LP is available for finding Λopt (x, k) ex- Proof: For any x, kxk1 = |hx, sgu(x)i|. By aply-
actly when x = PΛopt (x,k) x and nonzero entries of Λopt (x, k) ing the Cauchy-Schwarz inequality we obtain kxk1 ≤
are sufficiently well separated. Also, Fig. 5(a) shows that kxk2 ksgu(x)k2 . The lower bound follows since P sgn(x) has
OMP, CoSaMP, and SP are not efficient algorithms for signal k largest entries all equal√to ±1 (where x ∈ k ) and thus
recovery in this case because the sensing matrix A and the the l2 -norm of sgn(x) is k. The upper bound is obtained by
overcomplete dictionary D are highly coherent which indicates observing that each of the k largest entries of x can be upper
that the combined matrix AD can’t satisfy the condition of the bounded by kxk∞ .
RIP. Proof: We now bound the right-hand of (11). Note that
kxk1
√
k
≤ kxk2 in Theorem A.1. Then we have
As can be seen in Fig. 5(b), we compare the perfor-
mance of eight different algorithms for the case where the kD ∗ Dxk1
nonzero entries of a are clustered. Figure. 5(b) shows that η= √ ≤ kD ∗ Dxk2 . (43)
k
10
(a) The last equation holds because kDxk2 = 1. Thus, the (46)
is equivalent to 1 ≤ kDk2 . In particular, we need
kDk2 = 1, (47)
where the equation follows the fact that D is orthonormal and
hence kDk2 = 1 i.e, D = I. This completes the proof of
condition of the Definition 3.
B. Proof of Lemma 1
To complete the proof, we introduce the following lemma.
Lemma B.1: Let A ∈ Rm×n be a random matrix
following any distribution satisfy the condition of (18). Given
the assumptions for any given set T with kT k0 ≤ k and
δ ∈ (0, 1), we have
(1 − δ)kxk2 ≤ kAxk2 ≤ (1 + δ)kxk2 (x ∈ XT ) (48)
with probability at least
(b)
4
≥ 1 − 4( )e−c0 (ε)m . (49)
a
where XT is the set of all vectors in Rn indexed by T .
Proof: Note that kxk2 = 1 in this case. Thus (1 − δ) ≤
kAxk2 ≤ (1+δ). Assume that all the vectors q are normalized,
i.e. kqk2 = 1 for a finite set of points QT with QT ⊆ XT .
Then, we have
δ 4
min kx − qk2 ≤ (with kQT k0 ≤ ). (50)
q∈QT 4 a
Applying (18) for the set of points with the parameter ε = 2δ
and the probability exceeding the right side of (55) result in
δ 2 2 δ 2
(1 − ) kqk2 ≤ kAqk2 ≤ (1 + ) kqk2 (q ∈ QT ). (51)
2 2
To simplify the derivation, notice that (51) can be trivially
2
Fig. 5: Frequency of signal recovery out of 1000 trials for represented without the quadratic constraint on kAqk2 and
2
different SSSP variants when the nonzero entries in a are well kqk2 . The inequality (51) is equivalent to requiring
separated (a) and when the nonzero entries in a are clustered δ δ
(b). Here, k = 8, n = 256, d = 1024, the dictionary D ∈ (1 − )kqk2 ≤ kAqk2 ≤ (1 + )kqk2 (q ∈ QT ). (52)
2 2
Rn×d is a 4× overcomplete DFT and A ∈ Rm×n is a Gaussian
matrix. Since δmin is the smallest number, thus, we have
kAxk2 ≤ (1 + δmin )kxk2 (x ∈ XT ) (53)
Notice that kηk2 = 1 in this setting where D is orthonormal. The assumption that δmin is the smallest number implies
Combining this result in (43), we have that δmin ≤ δ. Recall that the vectors x are normalized, i.e.
kxk2 = 1. For a given set point q ∈ QT , the inequality (4)
η=1≤ sup kD ∗ Dxk2 . (44) holds if the following inequality holds
kDxk2 =1,kxk0 ≤k
δ
kx − qk2 ≤ . (54)
4
Using the Cauchy-Schwarz inequality again, we can get
Combining (53) and (54), we have
sup kD ∗ Dxk2 ≤ sup kDk2 kDxk2 . δ δ δ
kDxk2 =1,kxk0 ≤k kDxk2 =1,kxk0 ≤k kAxk2 ≤ kAqk2 + kA(x − q)k2 ≤ 1 + + (1 + ) . (55)
2 2 4
(45)
Combing (44) and (45) we see that Because δmin is the smallest number for which (53) holds,
the inequality (55) satisfy the following condition
1≤ sup kDk2 kDxk2 = sup kDk2 . (46) 3 δ
kDxk2 =1,kxk0 ≤k kxk0 ≤k
δmin ≤ δ(1 − ) ≤ δ. (56)
4 4
11
Thus, we complete the proof of the upper bound of (51). D. Proof of Lemma 5
Similarly, according to the definition of δmin , we derive the Recall that (29) in general CS model (30) is equivalent to
lower bound of (51) requiring
δ δ xk − xl+1 = Dak − xl+1
kAxk2 ≥ kAxk2 − kA(x − q)k2 ≥ 1 − − (1 + δ) ≥ 1 − δ. 2 2
2 4
(57) ≤ 0.5 Dak − xl+1 2
+ 7.5kA(x − Dak ) + ek2
Proof: Assume that there exists Cnk ≤ ( 42
δ ) 2k
such sub- ≤ 0.5 Dak − xl+1 + 7.5kA(x − Dak )k2 + 7.5kek2 .
2
spaces. The lemma B.1 shows that with probability at least (64)
42 2k 4 −c0 ( 2 )m Using the triangle inequality, we can get
≤ 4( ) ( )e δ . (58)
δ a
x − xl+1 2
= x − Dak + Dak − xl+1 2
(65)
If k ≤ c1 m/ log(n/k), then ≤ kx − Dak k2 + Dak − xl+1 .
2
δ 42 4
4e−c0 ( 2 )m+2k log( δ )+log( a )
≤ 4e−cc m , (59) Combing this results with (64), we obtain
x − xl+1 2
≤ 0.5 Dak − xl 2
+ kx − Dak k2
where both c1 and c1 are positive constants. The next step is to (66)
simply the both sides of (59) by leaving out the denominator + 7.5kA(x − Dak k2 + 7.5kek2 .
exponential term 4e such that Note that k contains the indices of the k largest entries in x.
Thus, xk is a best k-sparse approximate to x, i.e. kDak k2 ≤
δ 2k 42 1 4
c2 ≤ c0 ( )m − (log( ) + log( )) kxk2 . Using this to bound the right side of (66) yields
2 m δ k a
δ 2 log(42/δ) 2 log(a/4) x − xl+1 ≤ 0.5 x − xl + kx − Dak k2
≤ c0 ( )m − c1 ( + ) (60) 2 2
(67)
2 log(n/k) log(n/k) + 7.5kA(x − Dak k2 + 7.5kek2 .
k
δ 2(log(42/δ) + log (a/4) )
≤ c0 ( )m − c1 ( ) Repeat the same steps above, similarly, we can derive
2 log(n/k)
x − xl 2
≤ 2−l kDak k2 + kx − Dak k2
It is sufficient to choose c2 > 0 if c1 is enough small. This (68)
completes the proof of Lemma 1. + 15kA(x − Dak )k2 + 15kek2 .
This completes the proof of Lemma 5
C. Proof of Lemma 3
E. Proof of Theorem 4
l Proof: Recallm that when the number of iterations l + 1 = To complete the proof, we introduce the following Lemma.
log(kxk2 /kek2 )
log(1/C1 ) holds, it can be derived that Lemma E.1: Let Λ0 be an arbitrary subset of {1, 2, ..., n}
such that |Λ0 | ≤ k. For any signal x ∈ Rn , we define Λ1 as
x − xl+1 2
≤ C1l+1 x − x0 2 the index set corresponding to the k largest entries of xΛc0 (in
(61)
+ (1 + p + p2 + ... + pl )C2 kek2 . absolute value), Λ2 as the index set corresponding to the next
k largest entries, and so on. Then
To obtain the second bound in Lemma 3, we simply solve n
X xΛc0 1
the error recursion and note that kxΛi k2 ≤ √ . (69)
i≥2
k
1 − C1 l+1
(1+p+p2 +...+pl )C2 kek2 ≤ (1+ )C2 kek2 . (62) Proof: We begin by observing that for i ≥ 2,
1 − C1
xΛi−1 1
Combining (61) and (62) we see that kxΛi k∞ ≤ √ , (70)
k
1 − C1l+1 since the Λi sort x to have decreasing magnitude. Recall that
x − xl+1 2
≤ C1l x − x0 2
+ (1 + )C2 kek2
1 − C1 when (42) still holds, we can derive
1 − C1l+1 n √ X n n
≤ (1 + )C2 kek2 . X 1 X xΛc
1 − C1 kxΛi k2 ≤ k kxΛi k∞ ≤ √ kxΛi k1 = √0 1 .
(63) i≥2 i≥2
k i≥1 k
(71)
It follows that after finite iterations, the upper bound of (63) Proof: We begin by partitioning the signal (vector) x into
closely depend on the last inequality due to the equation of the vectors {xΛ1 , xΛ2 , ..., xΛn } in decreasing order of magnitude.
geometric series, the choice of lit , and the fact that x0 = 0. Subsets {Λ1 , Λ2 , ..., Λn } with length |Λ|0 ≤ k are cho-
This completes the proof of Lemma 3. sen such that they are all disjointed. Note that kAxk2 =
12
n
P
AΛ i x Λ i . Combing this with the upper bound of (4), Repeat the same steps above, similarly, we have
i=1 2
we have x − xl 2
≤ 2−l kDak k2 + 20ẽ. (78)
n n
X X This completes the proof of Theorem 5.
kAxk2 = AΛi xΛi ≤ kAΛi xΛ1 k2
i=1 2 i=1
n p
X p n
X ACKNOWLEDGMENT
≤ 1 + δk kxΛi k2 < 1 + δK (kxΛ1 k2 + kxΛi k2 ). The authors would like to thank Prof. Xu Ma and the anony-
i=1 i=2 mous reviewers for their insightful comments and constructive
(72) suggestions which have greatly improved the paper.
Combining (71) and (72), we see that
n R EFERENCES
X
kAxk2 = AΛi xΛi [1] E. J. Candes and T. Tao, “Decoding by linear programming,” IEEE
i=1
transactions on information theory, vol. 51, no. 12, pp. 4203–4215,
2
2005.
p xΛc0 1 (73) [2] D. L. Donoho, “Compressed sensing,” IEEE Transactions on information
< √
1 + δK (kxΛ1 k2 +) theory, vol. 52, no. 4, pp. 1289–1306, 2006.
k [3] R. Baraniuk, M. Davenport, R. DeVore, and M. Wakin, “A simple proof
p kxΛ k of the restricted isometry property for random matrices,” Constructive
≤ 1 + δK (kxΛ1 k2 + √1 1 ). Approximation, vol. 28, no. 3, pp. 253–263, 2008.
k [4] P. A. Randall, Sparse recovery via convex optimization. California
Note that Λi contains the indices of the kΛi k0 ≤ k largest Institute of Technology, 2009.
[5] E. J. Candès and B. Recht, “Exact matrix completion via convex
entries in xΛ1 . Thus, maybe xk is a best k-sparse approximate optimization,” Foundations of Computational mathematics, vol. 9, no. 6,
to x, i.e. kxΛ1 k2 ≤ kxk2 . Using this to bound the right side p. 717, 2009.
of the last inequality (73) yields [6] Y. Nesterov and A. Nemirovskii, Interior-point polynomial algorithms
in convex programming. Siam, 1994, vol. 13.
p kxΛ k p kxk [7] C. Soussen, J. Idier, J. Duan, and D. Brie, “Homotopy based algorithms
1 + δK (kxΛ1 k2 + √1 1 ) ≤ 1 + δK (kxk2 + √ 1 ). for l0-regularized least-squares,” hand, vol. 2, 2015.
k k [8] T. Blumensath and M. E. Davies, “Gradient pursuits,” IEEE Transactions
(74) on Signal Processing, vol. 56, no. 6, pp. 2370–2382, 2008.
Combining the above two inequalities yields (32). This [9] J. A. Tropp, A. C. Gilbert, and M. J. Strauss, “Algorithms for simulta-
neous sparse approximation. part i: Greedy pursuit,” Signal Processing,
completes the proof of Theorem 4. vol. 86, no. 3, pp. 572–588, 2006.
[10] S. G. Mallat and Z. Zhang, “Matching pursuits with time-frequency
dictionaries,” IEEE Transactions on signal processing, vol. 41, no. 12,
F. Proof of Theorem 5 pp. 3397–3415, 1993.
[11] J. A. Tropp and A. C. Gilbert, “Signal recovery from random mea-
To complete the proof, we introduce the following Lemma. surements via orthogonal matching pursuit,” IEEE Transactions on
Lemma F.1: Let x be an arbitrary signal in Rn . The information theory, vol. 53, no. 12, pp. 4655–4666, 2007.
measurements with noise perturbation y = Ax + e can also be [12] D. Needell and R. Vershynin, “Signal recovery from incomplete and
inaccurate measurements via regularized orthogonal matching pursuit,”
denoted as y = Axk + ê where IEEE Journal of selected topics in signal processing, vol. 4, no. 2, pp.
310–316, 2010.
kx − xk k1 [13] D. Needell and J. A. Tropp, “Cosamp: Iterative signal recovery from in-
kêk2 ≤ 1.14(kx − xk k2 + √ ) + kek2 . (75)
k complete and inaccurate samples,” Applied and computational harmonic
analysis, vol. 26, no. 3, pp. 301–321, 2009.
Proof: Notice that the term kA(x − Dak )k2 + kek2 ulti- [14] W. Dai and O. Milenkovic, “Subspace pursuit for compressive sens-
mately bounds the recovery error in (31). Combining this with ing signal reconstruction,” IEEE transactions on Information Theory,
vol. 55, no. 5, pp. 2230–2249, 2009.
(32), we have [15] D. L. Donoho, “Sparse components of images and optimal atomic
decompositions,” Constructive Approximation, vol. 17, no. 3, pp. 353–
kA(x − xk )k2 + kek2 382, 2001.
[16] A. C. Gilbert and J. A. Tropp, “Applications of sparse approximation in
p
≤ 1 + δk kx − xk k2 + kek2
(76) communications,” in Information Theory, 2005. ISIT 2005. Proceedings.
p kx − xk k1 International Symposium on, 2005, pp. 1000–1004.
≤ 1 + δk (kx − xk k2 + √ ) + kek2 , [17] B. D. Rao, “Signal processing with the sparseness constraint,” in IEEE
k International Conference on Acoustics, Speech and Signal Processing,
1998, pp. 1861–1864 vol.3.
√ inequality (76) follows from the fact that δk <
where the last [18] T. B. Cilingiroglu, A. Uyar, A. Tuysuzoglu, W. C. Karl, J. Konrad,
1/3 hence 1 + δk ≤ 1.14. B. B. Goldberg, and nl MS, “Dictionary-based image reconstruction for
Proof: Recall that (31) in Lemma 5. Thus, the recovery error superresolution in integrated circuit imaging.” Optics Express, vol. 23,
no. 11, pp. 15 072–87, 2015.
for such a signal estimation can be bounded from above as [19] N. R. Reyes, P. V. Candeas, and F. L. Ferreras, “Wavelet-based approach
for transient modeling with application to parametric audio coding,”
x − xl+1 2
≤ 0.5 x − xl 2
+ kx − Dak k2 Digital Signal Processing, vol. 20, no. 1, pp. 123–132, 2010.
+ 7.5(kA(x − Dak k2 + kek2 ) [20] J. L. Lin, W. L. Hwang, and S. C. Pei, “Video compression based on
orthonormal matching pursuits,” in IEEE International Symposium on
8.55 Circuits and Systems, 2006. ISCAS 2006. Proceedings, 2006, pp. 4 pp.–
≤ 0.5 x − xl + 9.55kx − Dak k2 + √ kx − Dak k1
2 5426.
k
[21] D. Malioutov, M. Cetin, and A. S. Willsky, “A sparse signal recon-
+ 7.5kek2 < 0.5 x − xl 2 + 10ẽ. struction perspective for source localization with sensor arrays,” IEEE
(77) Transactions on Signal Processing, vol. 53, no. 8, pp. 3010–3022, 2005.
13