0% found this document useful (0 votes)
17 views13 pages

Greedy Signal Space Recovery Algorithm W

Uploaded by

charushila patel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views13 pages

Greedy Signal Space Recovery Algorithm W

Uploaded by

charushila patel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

1

Greedy Signal Space Recovery Algorithm with


Overcomplete Dictionaries in Compressive Sensing
Jianchen Zhu, Stduent member, IEEE, Shengjie Zhao, Senior member, IEEE, Qingjiang Shi, Member, IEEE,
Gonzalo R. Arce, Fellow, IEEE

Abstract—Compressive Sensing (CS) is a new paradigm for constitute measurements y with noisy perturbations e. The
the efficient acquisition of signals that have sparse representation process is simply described as
in a certain domain. Traditionally, CS has provided numerous
arXiv:1905.02433v1 [cs.IT] 7 May 2019

methods for signal recovery over an orthonormal basis. However, y = Ax + e, (2)


modern applications have sparked the emergence of related
methods for signals not sparse in an orthonormal basis but where A ∈ Rm×n represents the sensing matrix, and e is the
in some arbitrary, perhaps highly overcomplete, dictionary, acquisition noise. Since m < n holds, problem (2) is ill-posed,
particularly due to their potential to generate different kinds
and the perturbations e lead to unstable solutions. However,
of sparse representation of signals. To this end, we apply a
signal space greedy method, which relies on the ability to using the fact that x is sparse such that kxko ≤ n, it is possible
optimally project a signal onto a small number of dictionary recover x exactly via solving an l1 -minimization problem
atoms, to address signal recovery in this setting. We describe
a generalized variant of the iterative recovery algorithm called x̂ = arg min kxk1 s.t.ky − Axk2 ≤ ε, (3)
Signal space Subspace Pursuit (SSSP) for this more challenging
setting. Here, using the Dictionary-Restricted Isometry Property where kek2 ≤ ε bounds the norm of the noise vector e.
(D-RIP) rather than classical RIP, we derive a low bound on the Therefore, x can be exactly recovered by solving problem (3)
number of measurements required and then provide the proof provided the conditions on the Restricted Isometry property
of convergence for the algorithm. The algorithm in noisy and (RIP) are satisfied.
noise-free measurements has low computational complexity and
provides high recovery accuracy. Simulation results show that the Def inition 1 [3]: The sensing matrix A ∈ Rm×n is said
algorithm outperforms best compared with the existing recovery to satisfy the k-order RIP if for any k-sparse (where kxk0 ≤ k)
algorithms. signal x ∈ Rn
2
Index Terms—Compressive Sensing (CS), sparse represen- (1 − δ) kxk 22 ≤ kAxk2 ≤ (1 + δ) kxk2 , (4)
tation, overcomplete dictionary, signal space greedy method,
projection, D-Restricted Isometry Property (D-RIP); where δ ∈ [0, 1]. The infimum of δ , denoted by δk , is called
the restricted isometry constant (RIC) of A
2
I. I NTRODUCTION δk := inf{δ : (1 − δ) kxk 22 ≤ kAxk2 ≤ (1 + δ) kxk2 }. (5)

OMPRESSIVE sensing [1], [2] is a recently developed Design of computationally efficient sparse signal recovery
C and fast growing field of research as a novel sampling
paradigm. Suppose that x is a length-n signal. It is said to
algorithms based on the RIP recovery conditions for the l1 -
norm relaxation has extensively studied in previous works.
be k-sparse (or compressible) if x can be well approximated Linear programming [1] and other convex optimization algo-
using only k ≪ n coefficients under the some transform rithms [4], [5] have been proposed to solve the problem (3).
The most common approaches include the basis pursuit (BP),
x = ψa, (1) interior-point (IP) [6], homotopy [7], and gradient projection
for sparse representation (GPSR) algorithm [8]. However,
where ψ is the sparsifying basis and a is the coefficient vector it has been shown that the sparse signal recovery problem
that has at most k nonzero entries. can be solved with stability and uniform guarantees with
For a k-sparse signal x, CS samples its m (m < n) polynomially bounded computation complexity.
random linear projections towards irrelative directions, which As a result, Greedy Pursuit (GP) [9] algorithms have also
been widely studied. The predominant idea of GP algorithms
This work was supported in part by the National Basic Researh Program of is to estimate the nonzero elements of a coefficient vector
China (973 program) under Grant 2014CB340404 and in part by the National
Science Foundation of China under Grant 61471267. iteratively. Matching Pursuit (MP) [10] is the earliest greedy
J. Zhu is with the College of Electronics and Information Engineering, pursuit algorithm. The Orthogonal Matching Pursuit (OMP)
Tongji University, Shanghai 201804, China (e-mail: [email protected]). [11] is a well-known greedy pursuit algorithm as the im-
S. Zhao is with the Key Laboratory of Embedded System and Service
Computing, Ministry of Education, Tongji University, Shanghai 201804, China proved version of MP. Several other advanced GP algorithms
(e-mail: [email protected]). have been proposed, such as the Regularized OMP (ROMP)
Q. Shi is with the School of Software Engineering, Tongji University, [12], Compressive Sampling Matching Pursuit (CoSaMP) [13],
Shanghai 201804, China (e-mail: [email protected]).
G. R. Arce is with the Department of Electrical and Computer Engineering, and Subspace Pursuit (SP) [14]. Generally speaking, the GP
University of Delaware, Newark, DE 19716 USA (e-mail: [email protected]). algorithms have received considerable attention due to low
2

computation complexity, high recovery accuracy, and simple


implementation.
In some cases, the signal of interest is not itself sparse, but
has a sparse representation in a overcomplete dictionary D.
Examples are found in a wide range of applications [15]–[17],
including image [18], audio [19], video compression [20], and
source localization [21]. Proposals for sparse signal repre- Fig. 1: The compressive sensing process and its domains. This
sentations in an overcomplete dictionary include multiscale distinguish the domains in which the measurements, signals,
Gabor functiuons [22], systems defined by algebraic codes and coefficients reside.
[23], wavelets and sinusoids [24], and multiscale windowed
ridgelets [25].
Numerous methods, both heuristic and theoretical, have greedy algorithms as well as l1 -minimization algorithm via
been developed to support the benefits of such sparse sig- BP.
nal representations: in theoretical neuroscience it has been The rest of the paper is organized as follows. We begin in
argued that sparse signal representations in an overcomplete Section II with a description of the mathematical model and its
dictionary are necessary for use in biological vision systems advantages. Section III develops our proposed algorithm and
[26]; in approximation theory, it has been demonstrated that gives the convergence and recovery condition of the proposed
approximation from overcomplete systems outperforms any algorithm in detail. Section IV provides some experimental
known basis [27]; in signal processing, learned overcomplete results of the algorithm and its comparison with other existing
dictionaries from a set of realizations of the data (training CS recovery algorithms. Finally, we make the conclusion in
signals) are highly adapted to the given class of signals Section V.
and therefore usually exhibit good representation performance
[28]; and in image processing, the learned dictionaries have II. S YSTEM M ODEL ON OVERCOMPLETE D ICTIONARIES
shown promising results in several recently published works A. Compressive Sensing Model
on compression of facial images [29], fingerprint images [30], Suppose we have the compressive measurements y ∈ Rm×1
geometry images of 3D face models [31], synthetic aperture of an unknown sparse signal x ∈ Rn×1 given by
radar (SAR) images [32], and hyperspectral images [33].
Consider the sparse signal representation based on atoms y = Ax + e, (6)
in a dictionary as columns in the matrix D ∈ Rn×d . A where A ∈ R m×n
(m ≤ n) is the sensing matrix and e ∈
sparse representation of the signal x ∈ Rn can be thought Rm×1 is the system noise. The sparsity condition on the signal
of as a coefficient vector a ∈ Rd in the dictionary D. x is that x = Da, for some coefficient vector a ∈ Rd×1
The representation is overdetermined if d > n. Hence, the with kak0 ≤ k ≤ n, yielding a k-sparse representation of the
representation of the signal in dictionary D is not unique, signal x with respect to the Dictonary D ∈ Rn×d (n ≤ d).
namely, there exists a variety of coefficient vectors that can be Thus, our task is to recover x based on y, A and D. Before we
used to synthesize the signal . Furthermore, when the columns elaborate the algorithm, we first elaborate the considered signal
of the dictionary are high correlated and under a measurement sparsity model and the analysis vector D ∗ x as an assumption
y, the matrix AD may not longer satisfy the condition of RIP. of ”analysis sparsity” for x in the following section.
Hence, coefficient space methods, which aim at recovering the
coefficient vectors, encounter a bottleneck due to the lack of B. Signal Sparisty Model
orthogonality of the dictionary.
It is shown the model is discussed in the sparse-dictionary
In this paper, we present a method based on the structure
setting and recovery framework [37] for CS in Fig 1. Given
of the signal and the minimization of the gradient pursuit.
a signal x, let D be n × d matrix whose columns D =
Our main contribution is to develop a method, which we
[D1 , D2 , ..., Dn ] ∈ Rn×d from a Parseval frame for Rn , i.e.
call the signal space method [34]–[36]. The advantage of the X
method is that it has the ability to optimally project the signal x= hx, Dk iDk
onto a small number of dictionary atoms under the true and k
(7)
correct support hypotheses at each iteration. By leveraging 2 2
X
kxk2 = |hx, Dk i| , ∀x ∈ Rn ,
this technique, we present a novel GP algorithm called Signal k
Space Subspace Pursuit (SSSP). Furthermore, when the sens-
where hi denotes the standard Euclidean inner product. Notice
ing matrix A satisfies the condition of D-RIP, we extend the
that an overcomplete dictionary D ∈ Rn×d is a Parseval frame
algorithm for the accurate recovery on the sparse signal in an
if DDT = I and a coefficient vector a is compressible or k-
overcomplete dictionary and thus provide a proof convergence,
sparse if kak0 ≤ k. Then using a natural extension of the
using the D-RIP. In particular, rigorous guarantees for bound
definition of RIP, the D-RIP is defined as follows.
condition are derived showing that the algorithm recovers ideal
Def inition 2 [34]: Fix a overcomplete dictionary D ∈
sparse representation with the recovery error that grows at
Rn×d . A is said to follow the restricted isometry property
most proportionally to the noise level. Finally, the simulations
adapted to D (abbreviated D-RIP) with the constant δk if
demonstrate that the algorithm provides significant gains in the
2 2 2
perfect recovery performance compared to that of the existing (1 − δk ) kDak2 ≤ kADak2 ≤ (1 + δk ) kDak2 (8)
3

holds for all kak0 ≤ k. A. Algorithm Design


Besides, for all k-sparse signals x in Rn , the required
An overview of the algorithm is introduced first. Then,
expected value of the constant δk can be calculated as
the flow of the algorithm is given and several key steps
δk = max (AD)T
T (AD)T − I 2
, (9) are analyzed. Finally, the advantages of the algorithm are
T ⊂{1,...,d},kT k0 ≤k discussed in detail.
where (AD)T is the submatrix of AD whose dimensions are Before introducing the iterative algorithm for sparse signal
m × |T | indexed by |T | and T is the transposition operator. recovery, the following notation will be used in the formulation
Notice that when D is the identity, the definition of D-RIP of the recovery algorithm.
reduces to the traditional definition of RIP. Numerous random N otation: Suppose that A is an m × n matrix. Suppose
matrices, such as Gaussian and Bernoulli matrices, satisfy that we observe a set of noisy measurements of the form
the condition of D-RIP with high probability, which implies y = Ax + e. If there exists an index set (support set) Λ ⊂
that the number of measurements required is on the order of {1, 2, ..., d}, we let DΛ denote its submatrix DΛ ∈ Rn×|Λ|
k log(d/k). Such an assumption on A that modifications to (3) with columns index by |Λ|, and we let R(DΛ ) represent the
bound the recovery error for the l1 -analysis method implies column span of DΛ .
that a signal x can be recovered from the noise-free and noisy Remark: Note that algorithm requires some knowledge
measurements by solving the convex minimization problem about the sparsity level k, and there are some effective
such that approaches to approximate the parameter. One alternative
approach is to conduct empirical studies with all sparsity levels
x̂ = arg min kD ∗ x̃k1 s.t.ky − Ax̃k2 ≤ µ, (10) and select the level which minimizes ky − ADãk2 .
x̃∈Rn
As it will be shown, the most remarkable novelty of the
where µ is the noise level with kek2 ≤ µ. The l1 -analysis algorithm is that the signal can be recovered exactly in an
method is based on the model assumption that for a signal overcomplete dictionary. This novelty makes the algorithm
x = Da not only the coefficient vector a, but also the more general and improves the selection of effective atoms.
analysis vector D ∗ x is sparse. Using the assummption √ tbat The main steps of the algorithm are summarized below.
adding an additional factor kD ∗ x − (D ∗ x)k k1 / k, the
upper bound of recovery error for √ the l1 -analysis method is TABLE I: Signal Space Subspace Pursuit Algorithm
give by O(kD ∗ x − (D ∗ x)k k1 / k + kek2 ), where (D ∗ x)k
Algorithm: Signal Space Subspace Pursuit
is a best k-sparse approximation of D ∗ x. Notice that the Input:
term kD ∗ x − (D ∗ x)k k1 bounds the recovery error in this Sensing matrix A; Dictionary D; measurements y;
case. If the analysis vector D ∗ x has a suitable decay , the sparsity level k; stopping criterion ε
Initialization:
recovery error depends only on the noise level kek2 in the Iteration time l = 0; support estimation I = ∅; initial residual
measurements. Under these assumptions, the convergence for r 0 = y; initial approximate x0 = 0
the l1 -analysis method and the algorithm we design are both in while halting criterion is not satisfied do
S1: Find the index Ω = SD (u, 3k) with u = A ∗ r
proportion to kek2 . Without loss of generality, the convergence {index corresponding to the largest magnitude entries in
for the algorithm similar to that for the l1 -analysis method (see the product}.
Sections III-D, III-E and III-F below for details). S2: Find the support estimation T = Ω ∪ I
S3: Calculate the signal estimation:
A weaker assumption on the convergence for the algorithm x̃ = Dã = D(arg minaT ky − ADãk2 s.t.aT C = 0)
that all signals corresponding to the coefficient vector a im- S4: Shrink the index I = SD (x̃, k) ={index corresponding to
plies that there exists a localization factor defined as follows. the largest magnitude entries in estimated x̃}
S5: Calculate the new signal estimation: xl+1 = PI x̃
Def inition 3: For a dictionary D ∈ Rn×d and a sparsity
S6: Calculate the new residual: rl+1 = y − Axl+1
level k, we define the localization factor as end while: l = M axIter or xl+1 − xl 2 / xl 2 ≤ ε is
satisfied
def kD ∗ Dak1
ηk,D = η = sup √ . (11) Output:
kDak2 =1,kxk0 ≤k k Signal estimation x̂ = xl+1 = SSSP (A, D, y, k)

The localization factor can be viewed as a measure of how


Before calculating the product, the column vectors of the
sparse the objective in problem (13) is. Notice that if D is
sensing matrix A = [A1 , ..., An ] should be normalized firstly
orthonormal, then η = 1 (proof: See appendix A), and η
in practice. The most relevant column of A for residual error
increases with the redundancy in D.
rl is select to minimize the next residual error rl+1 .
As can be seen in Fig. 2, although the product rl , Ai
III. S IGNAL S PACE S UBSPACE P URSUIT A LGORITHM is bigger than rl , Aj , the length of PAj rjl is smaller than
Firstly, we propose the algorithm based on GP algorithms in that of PAi ril , thus the residual error ril is smaller than rjl .
the sparse-dictionary framework. Secondly, using the condition Therefore the one that has the smallest residual error or biggest
of DIP, we provide a guarantee for the minimum number projection length is selected. The projection length onto the
of measurements required. Thirdly, we derive a bound that column vector Ai can be denoted as
theoretically provides a sufficient condition for exactly signal
rl , Ai
 
recovery to demonstrate probable performance of the algo- Ai
p= = rl , , (12)
rithm. kAi k2 kAi k2
4

columns, performing a projection which, given a general


vector, find the closest k-sparse vector, and then deciding
which columns of A to choose. In the representation case
(when D is an overcomplete dictionary ), a simple hard
thresholding is replaced with an appropriate operator that takes
a candidate signal and finds the best k-sparse representation
of a vector z. Towards this end, in the signal space, we define
Fig. 2: Projection of the residual error. Λopt (z, k) = arg min kx − PΛ zk2 , (14)
Λ:kΛk0 ≤k

where PΛ denotes the projection onto the span of the columns


where kAi k2 denotes the Euclidean length of vector Ai , and
Ai of D indexed by Λ. This problem is itself reminiscent of
kAi k2 is the normalized kAi k2 . So in practice, the sensing the conventional CS problem; one wants to recover a sparse
matrix A should be normalized before calculating the product.
representation from an underdetermined linear system [36].
Analogous to the classical GP algorithms, the most fun-
Thus, we make a conclusion that it is an NP-hard problem
damental step is to calculate the observation u during each
in general. Therefore, we allow for near-optimal projection
iteration, as shown in step 1 in the algorithm procedure.
to be used in the algorithm, writing SD (x, k) to denote the
This common step occupies most part of the calculation in
k-sparse approximation to x in D. To denote the k-sparse
all GP algorithms even in the case where D is an over-
approximation to SD , the algorithm is surprisingly able to
complete dictionary. Besides, the algorithm very similar to
exactly recover x.
this appear in the analysis of the initialization procedure
In general, consider that (14) seems to be NP hard because it
of these GP algorithms. The initial estimation matrix is the
requires examining all k possible combinations of the columns
matrix x0 = 0 and thus the initial residual is the matrix
in an overcomplete dictionary. To overcome this difficulty, an
of input measurements r0 = y. When an initial observation
approximation is needed. For this we instead look for a near-
u = A ∗ r0 as the proxy for the support estimation T is
optimal projection scheme, as used in our algorithm. It has
required. T 0 = Ω ∪ I 0 where Ω0 = SD (u, 3k) is a subroutine
been shown that as long as the near optimal projection is good
identifying the index set of the rows of u with the 3k largest
enough, namely,
row-klk2 -norms. In iteration l + 1, the algorithm update the
previous estimation xl by taking a step of calculating the PSD (x,k) x − x ≤ c1 kPΛ x − xk2
2
residual rl in the steepest descent direction A ∗ rl . A new (15)
PSD (x,k) x ≥ c2 kPΛ xk2
support estimation, T l+1 , is then obtained by merging previous
supports estimation Ωl and I l such that T l+1 = Ωl ∪I l . Finally, for all x and suitable constant c1 and c2 (where PΛ denotes
after l + 1 iteration, a new support estimation I l+1 is obtained the optimal projection), then the algorithm provides accurate
by taking a step of sharinking the index I l+1 = SD (x̃, k). recovery of the signal. Although such projections for a well
The algorithm updates the new solution xl+1 that minimizes behaved dictionary D exists, however, such projections are not
the residual error ky − ADaI l+1 k2 when restricted on I l+1 , known to exist when the dictionaries are highly redundant.
and calculate new residual rl+1 = y − Axl+1 . The choice Interestingly, empirical studies using classical GP algorithms
of stopping criteria plays an important role for the algorithm, for such projections showed that the algorithm using these
and the stopping criteria (such as normalized relative error projections still yields exact recovery in such setting.
xl+1 − xl 2 / xl 2 ≤ ε) applies to the experiments are
outlined in Section V.
B. Bound for the Number of Measurements
Next, recall that some key steps in the classical GP al-
gorithms. The GP algorithms identify nonzero entries of the Above all, setting aside the question of how to design the
support of the signal per iteration. Given a support estimation dictionary D, we address the problem of designing the sensing
T and a constant number of iterations l + 1, once the least matrix A. In the previous works, it can be shown that x
square solution is obtained based on the corresponding support can be stably recovered from the compressive measurements
estimation such that y satisfying the classical condition of RIP with a small
constant δk ∈ (0, 1). However, numerous signals in pratice
x̃ = D(A†T l+1 y) = D((AT
T l+1 AT l+1 )
−1 T
AT l+1 y). (13) are compressible in the overcomplete dictionary. Due to the
effect of redundancy, the recovery error kx − x̂k2 in signal
where A†T l+1 is the pseudoinverse of AT l+1 . If the accurate
space can be significantly smaller or larger than the recovery
support estimation AT l+1 is provided, then y = Ax =
error ka − âk2 in coefficient space.
AT l+1 aT l+1 with det(AT l+1 ) 6= 0, and so combining this
We now turn to the case where x is compressible in an
result in (13) yields x = x̃. These steps are trivial and can
overcomplete dictionary D. Specifically, the matrix A satisfies
be performed by simple thresholding of the entries of the
the condition of D-RIP of order k if there exists a constant
coefficient vector in the case where D is orthonormal, i.e,
δk ∈ (0, 1) such that
D = I. Thus, our task in sparse-dictionary signal recovery is
to correctly identify the support estimation T . The algorithm p kADak2 p
we design solve this problem by iteratively identifying like 1 − δk ≤ ≤ 1 + δk (16)
kDak2
5

holds for all a satisfying kak0 ≤ k. the condition of D- The proof follows that of Appendix B.
RIP ensures norm preservation of all signals having a sparse with e denoting the base of the natural logarithm, then with
representation x = Da. Thus, the condition of RIP is consid- probability 1 − α, A will satisfy the condition of D-RIP of
ered to be a stronger requirement when D is an overcomplete order k with the constant δ.
dictionary. As noted above, the random matrix approach is somewhat
There are numerous methods to design matrices that satisfy useful to help us solve signal recovery problems. In this
the condition of D-RIP. To the best of our knowledge, the paper, we will further focus on the random matrices in the
commonly used random matrices satisfy the condition of RIP development of our theory.
or D-RIP with high probability.
More speciffic, we consider matrices constructed as fol- C. Bound for the Tail Energy
lows: we generate a matrix A ∈ Rm×n by selecting the
entries A[M, N ] as independent and identically distributed In this section, we focus on the tail energy since it plays
random variables. We impose two conditions on the random a important role in our analysis of the convergence of the
distribution. First, we require that the distribution is centered algorithm (see Section III-E below for details). In particular,
idd 1 we give some useful expansions to demonstrate the bound
and normalized such that A[M, N ] ∼ N (0, m ). Second, we
2 condition of the tail energy.
require that the random variable kAxk2 in [3] has expected Def inition 4: Suppose that x is a k-sparse signal in the
2
value kxk2 ; that is, overcomplete dictionary D domain and e is an additional noise
(where kek2 ≤ ε), then we have
 
2 2
E kAxk2 = kxk2 . (17)
kx − xk k1
Generally speaking, any distribution, which includes the ẽ = kx − xk k2 + √ + kek2 , (22)
k
Gaussian and uniform distribution, with bounded support is
subgaussian. where xk is the best approximation of x. The algorithm makes
The key property of subgaussian random variables that will significant progress at each iteration where the recovery error
be of used in this paper is that any matrix A ∈ Rm×n which is large relative to the tail energy. In noisy case, the tail energy
for a fixed vector (signal) x satisfies as the quantity measure the baseline recovery error.
  Assume that p is a number in the interval (0, 1). Let
2 2 2
Pr kAxk2 − kxk2 ≥ ε kxk2 ≤ 4e−c0 (ε)m . (18) the signal x is p-compressible with magnitude R when the
components of x such that |x1 | ≥ |x2 | ≥ · · · ≥ |xn | obey a a
It implies that the matrix A will satisfy the condition of power law decay such that
D-RIP with high probability as long as m is at least on the
order of k log(d/k). From this, the probability is taken over all |xi | ≤ R ∗ i−1/p , ∀x = 1, 2, · · · n. (23)
draws of A and the constant c0 (ε) rely both on the particualr
According to (23), kxk1 ≤ R ∗ (1 + log n) when p = 1 and
subgaussian distribution and the range of ε. Perhaps the most
p-compressible signal is almost sparse when p ≈ 0. In general,
important for our purpose is the following lemma.
the p-compressible signals apply to approximate sparse signals
Lemma 1: Let χ denote any k-dimensiinal of Rn . Fix
such that
δ, α ∈ (0, 1). Suppose that A is a m × n random matrix
with i.i.d entries chosen from a distribution satisfying (18), kx − xk k1 ≤ D1 ∗ R ∗ k 1−1/p ,
(24)
we obtain the minimal number of measurements required for kx − xk k2 ≤ D2 ∗ R ∗ k 1/2−1/p ,
exact recovery
where the constants D1 = (1/p − 1)−1 and D2 =
2k log(42/δ) + log(4/α) (2/p − 1)−1/2 . Note that (24) provides upper bounds on the
m = O( √ ), (19)
c0 (δ/ 2) two different norms of the recovery error kx − xk k. Combin-
then with probability exceeding 1 − α, ing this result with (22), the tail energy in a p-compressible
√ √ signal is upper bounded by
1 − δkxk2 ≤ kAxk2 ≤ 1 + δkxk2 (20)
ẽ ≤ 2D1 ∗ R ∗ k 1/2−1/p + kek2 . (25)
for all x ∈ χ.
Proof: See Appendix B. When the parameter p is enough small, the most term in the
When D is an overcomplete dictionary, one can use Lem- right hand (25) decays rapidly as the sparsity level k increases.
mma 1 to go beyond a single k-dimensional subspace to in-
stead considering all possible subspace spanned by k columns D. Recovery of Approximately Sparse-Dictionary Signals from
of D, thereby establishing the condition of D-RIP for A. Then, Incomplete Measurements
we have the following Lemma. Consider that signals have a sparse representation in an
Lemma 2: Let D be an overcomplete dictionary whose overcomplete dictionary D, we theoretically provide a guar-
dimension is n×d and fix δ, α ∈ (0, 1). we obtain the minimal antee for exact recovery of sparse-dictionary signals. Analo-
number of measurements required for exact recovery gously to the guarantees of GP algorithms, the proof relies on
2k log(42ed/δk + log(4/α) iteration invariant which indicates that the recovery error is
m = O( √ ) (21) mostly determined by the number of iterations.
c0 (δ/ 2)
6

Before stating the main result for the algorithm (Theorem Each iteration of the algorithm reduces the recovery error by
3), we first state the following Theorem . a constant factor, while adding an additional noise component.
T heorem 2 Assume that A satisfy the condition of D-RIP By taking a sufficient number of iterations l, the most term
with the constant δ4k < 0.1. Let SD (x, k) be the near optimal 2−l kxk2 can be made as small as possible, and ultimately the
projections in (15) and xl+1 be the approximation after l + 1 recovery error is proportional to the noise level in the noisy
c2
iterations. if (1 + c1 )(1 − (1+β) 2 ) < 1, the upper bound of measurements. If the accurate SD is provided, the upper bound
recovery error after l + 1 iterations is given by of recovery error in (29) also applied to those of commonly
used results.
x − xl+1 2
≤ η1 kek2 , (26)
where β is an arbitrary constant, and η1 is a constant which
E. Recovery of Approximately Arbitrary Signals from Incom-
depends on c1 , c2 and β. Inspired by the precious work in the
plete Measurements
signal space setting, the conditions of Theorem 2 on the near
optimal projections holds in cases where D is not unitary and As shown in the proof of Theorem 3, in the case where the
especially in cases where D is highly overcomplete/redundant signls have a sparse representation in D, smaller values of c1
that can’t satisfy the traditional condition of RIP. To the and c2 result in a more accurate recovery and it is possible to
best of our knowledge, the classical GP algorithms are used achieve accurate recovery as accurate as desired by choosing
to calculate the projections. Thus, we provide a stronger small enough of kek2 . However, this is not the case that signals
convergence for the algorithm even when the dictionary is don’t exactly have a sparse representation in D, that is, if
highly overcomplete in the following Theorem.
y = A(Dak ) + A(x − Dak ) + e = A(Dak ) + ê. (30)
T heorem 3: Let A be a sensing matrix satisfying the
condition of D-RIP of order 4k or a coefficient vector a such Notcie that the term ê = A(x − Dak ) + e can be viewed as
that x = Da. Then, the signal estimation xl+1 after l + 1 the noise in the noisy measurements of the k-sparse signal Dak
iterations of the algorithm satisfies with kak k0 ≤ k. In fact, the ”new” noise ê bounds maximum
x − xl+1 ≤ C 1 x − xl + C2 kek2 achievable accuracy. For the sake of illustration, the condition
2 2
of Lemma 4 still holds. Further, we state 2 Theorems and
with
r 2 Lemmas in this section, which can be considered as the
1 + δ4k (27) extensions of Theorem 3 and its Lemma (Lemma 4) to this
C1 = ((2 + λ1 )δ4k + λ1 )(2 + λ2 )
1 − δ4k case.
(2 + λ2 )((2 + λ1 )(1 + δ4k ) + 2) First, we state the following lemma, which can be consid-
C2 = ( √ ).
1 − δ4k ered as a generalization to Lemma 4.
Proof: The proof follows that of Theorem II.1 [36] Lemma 5: For the general CS model y = A(Dak ) + ê in
Notice that constants C1 and C2 that depended on the (30), if δ4k < 0.1, the upper bound of recovery error is given
isometry constant δ4k and on the approximation parameters by
λ1 and λ2 . Further, an immediate Lemma of Theorem 3 is the x − xl+1 ≤ 0.5 x − xl + kx − Dak k2
2 2
following.
Lemma 3: Assume that the conditions of Theorem + 7.5kA(x − Dak )k2 + 7.5kek2 .
l Then after m a constant number of iterations l + 1 =
3. Particularly, (31)
log(kxk2 /kek2 ) l −l
log(1/C1 ) it holds that x−x ≤ 2 kDak k2 + kx − Dak k2
2

1 − C1 l+1 + 15kA(x − Dak )k2 + 15kek2 ,


x − xl+1 2
≤ (1 + )C2 kek2 . (28)
1 − C1 where ak is the best k-sparse approximation of x with kak k0 ≤
Proof: See Appendix C. k.
Notice that Lemma 3 implies the results, Theorem 2, with Proof: See appendix D.
1−C l+1 Notice that the coefficient vector ak we choosed is used to
η1 = (1 + 1−C1 1 )C2 .
More specifically, through various combinations of c1 , c1 minimize the upper bound of (31), which indicates that ak
and δ4k , Theorem 3 shows that C1 < 1 and the accuracy of is still important for sparse signal recovery in measurements
the algorithm improves per iteration. Thus, we obtain C1 ≤ 0.5 and specifically in noisy measurements. From Lemma 5, the
1 term kA(x − Dak )k2 can be used to prove the convergence
and C2 ≤ 7.5 if λ1 = 10 , λ2 = 1, and δ4k ≤ 0.1. Applying
the recursive nature of the Theorem 3, we have the following of the algorithm when D is not unitary. The assumption that
Lemma. modifications to (31) implies that there exist an upper bound
Lemma 4: Suppose that the condition of Theorem 3 of the term kA(x − Dak )k2 in the signal space as stated in
hold with the constant δ4k ≤ 0.1. For each iteration of the the following Theorem.
algorithm, the signal estimation xl after l-th iterations is k- T heorem 4: Suppose that A satisfies the upper bound of
sparse, and RIP with the constant δ4k < 0.1.Then, for any vector x ∈ Rn ,
x − xl+1 ≤ 0.5 x − xl + 7.5kek2 . p 1
2 2 kAxk2 ≤ 1 + δk (kxk2 + √ kxk1 ). (32)
Particularly, (29) k
x − xl 2
≤ 2−l kxk2 + 15kek2 . Proof: See appendix E.
7

Using this Theorem to bound the right hand of (31), we Using this Lemma that the term kAD(a − ak )k2 bounds
derive the recovery error in the coefficient space, we derive
x − xl+1 2 ≤ 0.5 x − xl 2 + 7.5kek2 x − xl+1 2 ≤ 0.5 x − xl 2 + 7.5kek2 + kx − Dak k2

7.5 1 + δk
p 1
+ 7.5 1 + δk (ka − ak k2 + √ ka − ak k2 ).
p
+ (7.5 1 + δk + 1)kx − Dak k2 + √ kx − Dak k1 .
k k
Particularly, Particularly,
x − xl 2 ≤ 2−l kDak k2 + 15kek2 x − xl 2 ≤ 2−l kDak k2 + 15kek2 + kx − Dak k2
√ p 1
p 15 1 + δk + 15 1 + δk (ka − ak k2 + √ ka − ak k2 ),
+ (15 1 + δk + 1)kx − Dak k2 + √ kx − Dak k1 . k
k (38)
(33)
where ak is the best k-sparse approximation of a. If ak we
Denote chose is arbitrarily compressible, then a = ak , which implies
1 that the upper bound of (38) is reasonably small.
M (x) := inf (kx − Dak k2 + √ kx − Dak k1 ), (34)
ak :kak k0 ≤k k
F. Computation Complexity of the Algorithm
is the model mismatch quantity (for any x ∈ Rn ). Notice that
(32) and (34) have very similar form even in the case where In this section, we further obtain the following result re-
D is not a overcomplete dictionary. Combining this result in garding the convergence speed of the algorithm.
(33), we have Recall that x̂ = xl+1 as an output of the algorithm after
p l + 1 iterations. Given a postive parameter η, the algorithm
x − xl+1 2 ≤ 0.5 x − xl 2 + 7.5kek2 + 8.5 1 + δk M (x). produces a signal estimation x̂ after at most O(log kxk2 /η)
Particularly, iterations such that
p
x − xl ≤ 2−l kDak k2 + 15kek2 + 16 1 + δk M (x). kx − x̂k2 = O(η + kek2 ) = O(max {η, kek2 } . (39)
2
(35) The cost of one iteration of the algorithm is dominated
Notice that the quantity bounds the above recovery error. If by the cost of steps 1 an 6 of the algorithm as Table I is
the quantity we chose is enough large, then the signal is not a presented. The first step is to obtain the proxy u = A ∗ r
k-sparse signal or a compressible signal such that x 6= Dak , and the signal estimation x̃. The next step is to calculate
which implies that signals still don’t exactly have a sparse the support approximation SD efficiently with the classical
representation in D. GP algorithms which includes OMP, ROMP, CoSaMP and
Similarly, we derive an upper bound of recovery error which SP are used to estimate SD . The running time of these
is nearly relative to the tail energy as stated in the following algorithms over an n × d dictionary D is O(knd) or O(nd).
Theorem. Therefore, the overall running time of these GP algorithms
T heorem 5: Let A be a sensing matrix satisfying the is O(knd log kxk2 /η) or O(nd log kxk2 /η). Notice that the
condition of D-RIP. Assume that δ4k < 0.1. Given the dictionary D is overcomplete. For sparse signal recovery,
assumption that modifications to (33) for the general CS these running time are in line with advanced bounds for
model, the upper bound of recovery error is given by the algorithm, which implies that the algorithm has linear
convergence shown in Fig. 3.
x − xl+1 2
≤ 0.5 x − xl 2
+ 10ẽ. Interestingly, we now turn to the case where the number
Particularly, (36) of measurements required is calculated through reducing the
l −l approximation recovery error if there exists the R-SNR. Thus,
x−x 2
≤ 2 kDak k2 + 20ẽ. given a sparse-dictionary x with kxk2 ≤ 2R, the upper bound
Proof: See appendix F. of SNR is given by
After l + 1 iterations, the term 2−l kDak k2 can be made kxk2 kxk2
enough small such that lim 2−l kDak k2 = 0 and the recovery R − SN R = 10 log( ) = 10 log( )
l
kx − x̂k2 kx − Dâk2
error depends only on the tail energy, which implies that the kxk2 kxk2
algorithm make significant progress per iteration in this case. ≤ 10 log( ) = 10 log( )
kx − xk k2 ẽ
Recall that the term kA(x − Dak )k2 bounds the recovery 2R 1 1 1
error in (31). The assumption that modifications to (31) implies ≤ 10 log( ) = log( − 1) + ( − ) log k,
2D1 · R · k 1/2−1/p p p 2
that there exist an upper bound of the term kA(x − Dak )k2 (40)
in the coefficient space as stated in the following Lemma.
Lemma 6: If AD satisfies the condition of D-RIP with where x̂ is the approximation of x. The number of iterations
the constant δ4k < 0.1, then using the extension of (32) yields required is O(log k). Therefore, if the fixed R-SNR can
be guaranteed, the overall running time of the algorithm is
kA(x − xk )k2 = kAD(a − ak )k2 O(nd log kxk2 /η · SN R) in this case, which further implies
p ka − ak k1 (37) that the computation complexity of the algorithm is nearly
≤ 1 + δk (ka − ak k2 + √ ). linear in the signal length.
k
8

A. Simulation Results on Analysing the Recovery Performance


of the Algorithm under a Renormalized Orthogonal Dictionary
In the first experiment, we evaluate the recovery perfor-
mance of the algorithm and compare it with that of the five
existing algorithms mentioned above. Note that the matrix D
is an orthogonal but not a normalized basis. The signal x of
the length n = 256 is sparse in the Dictionary domain, i.e.
x = Da, where the dictionary D is the 256 × 256 matrix. Its
coefficient vector a has k = 10 nonzero entries whose mag-
nitude are Gaussian distributed and locations is at uniformly
random. We investigate the frequency of signal recovery as a
function of the number of measurements. Simulation results
are shown in Fig. 4.
Recall that the problem (14) is NP-hard in our analytical
framework because that it requires examining all k combina-
tions of the columns of D. To calculate Λopt (x, k) with such
Fig. 3: Convergence of the signal space subspace pursuit a dictionary, we utilize the column norms of D to divide the
algorithm for three types of signals. k largest nonzero entries of the analysis vector D ∗ x and their
corresponding supports, which implies that sets Λ equal to the
positions of the k largest entries.
IV. S IMULATION R ESULTS As can be seen in Fig. 4(a), in the noise-free case, the
This section tests the performance of the algorithm by algorithm improves the signal recovery frequency significantly
conducting a wide range of numerical experiments. Above compared to those of the five existing algorithms. For example,
all, when the dictionary is an orthonormal basis, the influ- the algorithm recovers a k-sparse signal x with more than
ence of the sparsity level and the number of measurements 90% frequency up to the number of measurements m = 55.
required for its recovery performance is studied. Recovery Whereas, the LP-minimization algorithm with high computa-
performance analyses are further conducted by comparing the tion complexity is able to recover x only up to the number
algorithm with some recently developed recovery algorithms of measurements m = 60 under the same signal recovery
being commonly used, including OMP, ROMP, CoSaMP, SP frequency constraint. Moreover, as can be seen in Fig. 4(b),
and LP. Next, note that if the dictionary is not an orthonormal in the noisy case, the algorithm outperforms other algorithms
basis, the main difficulty in implementing our algorithm is in by optimally approximating the supports. Further, as can be
calculating the projection of the signal onto a small number seen in Fig. 4, the algorithm utilize the matrix AD to recovery
of dictionary atoms. To overcome this difficulty, we apply GP the coefficient vector a because of the nonremalized columns
algorithms to approximate it. By conducting 1000 independent in D. Meanwhile, other four existing GP algorithms almost
trials in all simulations, the algorithm using the signal space never recover the correct signal.
method can outperform the conventional algorithms.
In all experiments, simulated data are generated by taking B. Simulation Results on Analysing the Recovery Performance
the following steps: of the Algorithm under an Overcomplete Dictionary
1) Generate a k-sparse signal x of length n = 256 sparse In the second experiment, we check the effect of the algo-
in the dictionary D domain, i.e. x = Da. Its coefficient rithm with different support estimation techniques both for the
vector a has k nonzero entries whose magnitudes are case where the k nonzero entries of a are well separated and
Gaussian distributed and locations are at uniformly the case where they are clustered, and make comparisons with
random. other four existing algorithms which includes OMP, CoSaMP,
2) Generate a sensing matrix A ∈ Rm×n . Then entries of A SP and LP. Note that the matrix D is a 4× overcomplete DFT
are independently generated from Gaussian distribution. dictionary. Thus, neighbouring columns are highly coherent in
3) Compute the measurements by y = Ax or y = Ax + e. this dictionary. We fix the sparsity level k = 8 and investigate
After the simulation data are generated, the above men- the frequency of signal recovery as a function of the number
tioned algorithms are used to recover a k-sparse signal x under of measurements m. Simulation results are shown in Fig. 5.
the given A and y. As discussed in Section III-A, the main difficulty in im-
To evaluate the estimation quality, two indices Rrec and plementing the algorithm is in calculating Λopt (x, k). One
Rres are commonly used. First, the recovery error Rrec is such projection is required in step 1 as shown in Table I;
defined by another such projection is required in step 4.To overcome this
kx − x̂k2 difficulty, we apply some classical CS algorithms like OMP,
Rrec (x, x̂) = ≤ ε1 . (41) SP, CoSaMP and LP to calculate the near-optimal supports
kxk2
SD (x, k). For short notation, we label ’SSSP(OMP)’ when
We say that a signal x is exactly recovered when the signal OMP is used for calculating SD (x, k), label ’SSSP(SP)’ when
estimation x̂ satisfies kx − x̂k2 ≤ 10−4 kxk2 . SP is used for calculating SD (x, k), and so forth.
9

(a) SSSP(CoSaMP) performs better than other algorithms when


using CoSaMP for the near-optimal projection SD (x, k). This
is because that CoSaMP selects 2k largest nonzero entries
during each iteration and then has little effect on the coherence
of neighboring active columns in D. Also, Fig. 5(b) shows
that SSSP(OMP) and OMP always fail with the increase of
m in this case because OMP is designed to select one index
at each iteration which indicates that it is not effective for
recovering the correct support and will be as affected by the
high coherence between close atoms in the cluster and around
it. It can be seen from Fig. 5 that the algorithm yields accurate
recovery whereas LP and OMP do not perform well at all when
the support of x is clustered together and the exact opposite
behavior is seen when the support has enough separation.
Generally speaking, the algorithm variants outperform the
corresponding classical CS algorithm.

V. C ONCLUSION
(b) In this paper, we present support estimation techniques for a
greedy sparse-dictionary signal recovery algorithm. Using this
method, we propose the signal space subspace pursuit algo-
rithm based on signal space method and establish theoretical
signal recovery guarantees. We observe that the accuracy of the
algorithm in this setting depends on the signal structure, even
though their conventional recovery guarantees are independent
of the signal structure. We analyze the behavior of the signal
space method when the dictionary is highly overcomplete
and thus does not satisfy typical conditions like the RIP
or incoherence. Under specific assumptions on the signal
structure, we demonstrate that the signal space method is used
to optimally approximate projections. Thus, our analysis pro-
vides theoretical backing to explain the observed phenomena.
According to the simulation results and through comparison
with several other commonly used algorithms, in the noise-free
and noisy cases, the algorithm achieves outstanding recovery
Fig. 4: Performance comparison of perfect signal recovery performance.
frequency for the signals having a k = 10 sparse representation
in a renormalized orthogonal dictionary D: (a) With noise-free APPENDIX
measurements. (b) With noisy measurements.
A. Proof of condition of the Definition 3: if the dictionary is
orthonormal, the value of the localization factor is one
To complete the proof, we introduce the
P following Theorem.
As can be seen in Fig. 5(a), we compare the performance
T heorem A.1: Suppose that x ∈ k , then
of eight different algorithms for the case where the nonzero
entries of a are well separated. Fig. 5(a) shows that SSSP(LP) kxk1 √
√ ≤ kxk2 ≤ kkxk∞ . (42)
performs better than other algorithms when using a classical k
algorithm like LP for the near-optimal projection SD (x, k).
This is because that LP is available for finding Λopt (x, k) ex- Proof: For any x, kxk1 = |hx, sgu(x)i|. By aply-
actly when x = PΛopt (x,k) x and nonzero entries of Λopt (x, k) ing the Cauchy-Schwarz inequality we obtain kxk1 ≤
are sufficiently well separated. Also, Fig. 5(a) shows that kxk2 ksgu(x)k2 . The lower bound follows since P sgn(x) has
OMP, CoSaMP, and SP are not efficient algorithms for signal k largest entries all equal√to ±1 (where x ∈ k ) and thus

recovery in this case because the sensing matrix A and the the l2 -norm of sgn(x) is k. The upper bound is obtained by
overcomplete dictionary D are highly coherent which indicates observing that each of the k largest entries of x can be upper
that the combined matrix AD can’t satisfy the condition of the bounded by kxk∞ .
RIP. Proof: We now bound the right-hand of (11). Note that
kxk1

k
≤ kxk2 in Theorem A.1. Then we have
As can be seen in Fig. 5(b), we compare the perfor-
mance of eight different algorithms for the case where the kD ∗ Dxk1
nonzero entries of a are clustered. Figure. 5(b) shows that η= √ ≤ kD ∗ Dxk2 . (43)
k
10

(a) The last equation holds because kDxk2 = 1. Thus, the (46)
is equivalent to 1 ≤ kDk2 . In particular, we need
kDk2 = 1, (47)
where the equation follows the fact that D is orthonormal and
hence kDk2 = 1 i.e, D = I. This completes the proof of
condition of the Definition 3.

B. Proof of Lemma 1
To complete the proof, we introduce the following lemma.
Lemma B.1: Let A ∈ Rm×n be a random matrix
following any distribution satisfy the condition of (18). Given
the assumptions for any given set T with kT k0 ≤ k and
δ ∈ (0, 1), we have
(1 − δ)kxk2 ≤ kAxk2 ≤ (1 + δ)kxk2 (x ∈ XT ) (48)
with probability at least
(b)
4
≥ 1 − 4( )e−c0 (ε)m . (49)
a
where XT is the set of all vectors in Rn indexed by T .
Proof: Note that kxk2 = 1 in this case. Thus (1 − δ) ≤
kAxk2 ≤ (1+δ). Assume that all the vectors q are normalized,
i.e. kqk2 = 1 for a finite set of points QT with QT ⊆ XT .
Then, we have
δ 4
min kx − qk2 ≤ (with kQT k0 ≤ ). (50)
q∈QT 4 a
Applying (18) for the set of points with the parameter ε = 2δ
and the probability exceeding the right side of (55) result in
δ 2 2 δ 2
(1 − ) kqk2 ≤ kAqk2 ≤ (1 + ) kqk2 (q ∈ QT ). (51)
2 2
To simplify the derivation, notice that (51) can be trivially
2
Fig. 5: Frequency of signal recovery out of 1000 trials for represented without the quadratic constraint on kAqk2 and
2
different SSSP variants when the nonzero entries in a are well kqk2 . The inequality (51) is equivalent to requiring
separated (a) and when the nonzero entries in a are clustered δ δ
(b). Here, k = 8, n = 256, d = 1024, the dictionary D ∈ (1 − )kqk2 ≤ kAqk2 ≤ (1 + )kqk2 (q ∈ QT ). (52)
2 2
Rn×d is a 4× overcomplete DFT and A ∈ Rm×n is a Gaussian
matrix. Since δmin is the smallest number, thus, we have
kAxk2 ≤ (1 + δmin )kxk2 (x ∈ XT ) (53)

Notice that kηk2 = 1 in this setting where D is orthonormal. The assumption that δmin is the smallest number implies
Combining this result in (43), we have that δmin ≤ δ. Recall that the vectors x are normalized, i.e.
kxk2 = 1. For a given set point q ∈ QT , the inequality (4)
η=1≤ sup kD ∗ Dxk2 . (44) holds if the following inequality holds
kDxk2 =1,kxk0 ≤k
δ
kx − qk2 ≤ . (54)
4
Using the Cauchy-Schwarz inequality again, we can get
Combining (53) and (54), we have
sup kD ∗ Dxk2 ≤ sup kDk2 kDxk2 . δ δ δ
kDxk2 =1,kxk0 ≤k kDxk2 =1,kxk0 ≤k kAxk2 ≤ kAqk2 + kA(x − q)k2 ≤ 1 + + (1 + ) . (55)
2 2 4
(45)
Combing (44) and (45) we see that Because δmin is the smallest number for which (53) holds,
the inequality (55) satisfy the following condition
1≤ sup kDk2 kDxk2 = sup kDk2 . (46) 3 δ
kDxk2 =1,kxk0 ≤k kxk0 ≤k
δmin ≤ δ(1 − ) ≤ δ. (56)
4 4
11

Thus, we complete the proof of the upper bound of (51). D. Proof of Lemma 5
Similarly, according to the definition of δmin , we derive the Recall that (29) in general CS model (30) is equivalent to
lower bound of (51) requiring
δ δ xk − xl+1 = Dak − xl+1
kAxk2 ≥ kAxk2 − kA(x − q)k2 ≥ 1 − − (1 + δ) ≥ 1 − δ. 2 2
2 4
(57) ≤ 0.5 Dak − xl+1 2
+ 7.5kA(x − Dak ) + ek2
Proof: Assume that there exists Cnk ≤ ( 42
δ ) 2k
such sub- ≤ 0.5 Dak − xl+1 + 7.5kA(x − Dak )k2 + 7.5kek2 .
2
spaces. The lemma B.1 shows that with probability at least (64)
42 2k 4 −c0 ( 2 )m Using the triangle inequality, we can get
≤ 4( ) ( )e δ . (58)
δ a
x − xl+1 2
= x − Dak + Dak − xl+1 2
(65)
If k ≤ c1 m/ log(n/k), then ≤ kx − Dak k2 + Dak − xl+1 .
2
δ 42 4
4e−c0 ( 2 )m+2k log( δ )+log( a )
≤ 4e−cc m , (59) Combing this results with (64), we obtain
x − xl+1 2
≤ 0.5 Dak − xl 2
+ kx − Dak k2
where both c1 and c1 are positive constants. The next step is to (66)
simply the both sides of (59) by leaving out the denominator + 7.5kA(x − Dak k2 + 7.5kek2 .
exponential term 4e such that Note that k contains the indices of the k largest entries in x.
Thus, xk is a best k-sparse approximate to x, i.e. kDak k2 ≤
δ 2k 42 1 4
c2 ≤ c0 ( )m − (log( ) + log( )) kxk2 . Using this to bound the right side of (66) yields
2 m δ k a
δ 2 log(42/δ) 2 log(a/4) x − xl+1 ≤ 0.5 x − xl + kx − Dak k2
≤ c0 ( )m − c1 ( + ) (60) 2 2
(67)
2 log(n/k) log(n/k) + 7.5kA(x − Dak k2 + 7.5kek2 .
k
δ 2(log(42/δ) + log (a/4) )
≤ c0 ( )m − c1 ( ) Repeat the same steps above, similarly, we can derive
2 log(n/k)
x − xl 2
≤ 2−l kDak k2 + kx − Dak k2
It is sufficient to choose c2 > 0 if c1 is enough small. This (68)
completes the proof of Lemma 1. + 15kA(x − Dak )k2 + 15kek2 .
This completes the proof of Lemma 5

C. Proof of Lemma 3
E. Proof of Theorem 4
l Proof: Recallm that when the number of iterations l + 1 = To complete the proof, we introduce the following Lemma.
log(kxk2 /kek2 )
log(1/C1 ) holds, it can be derived that Lemma E.1: Let Λ0 be an arbitrary subset of {1, 2, ..., n}
such that |Λ0 | ≤ k. For any signal x ∈ Rn , we define Λ1 as
x − xl+1 2
≤ C1l+1 x − x0 2 the index set corresponding to the k largest entries of xΛc0 (in
(61)
+ (1 + p + p2 + ... + pl )C2 kek2 . absolute value), Λ2 as the index set corresponding to the next
k largest entries, and so on. Then
To obtain the second bound in Lemma 3, we simply solve n
X xΛc0 1
the error recursion and note that kxΛi k2 ≤ √ . (69)
i≥2
k
1 − C1 l+1
(1+p+p2 +...+pl )C2 kek2 ≤ (1+ )C2 kek2 . (62) Proof: We begin by observing that for i ≥ 2,
1 − C1
xΛi−1 1
Combining (61) and (62) we see that kxΛi k∞ ≤ √ , (70)
k
1 − C1l+1 since the Λi sort x to have decreasing magnitude. Recall that
x − xl+1 2
≤ C1l x − x0 2
+ (1 + )C2 kek2
1 − C1 when (42) still holds, we can derive
1 − C1l+1 n √ X n n
≤ (1 + )C2 kek2 . X 1 X xΛc
1 − C1 kxΛi k2 ≤ k kxΛi k∞ ≤ √ kxΛi k1 = √0 1 .
(63) i≥2 i≥2
k i≥1 k
(71)
It follows that after finite iterations, the upper bound of (63) Proof: We begin by partitioning the signal (vector) x into
closely depend on the last inequality due to the equation of the vectors {xΛ1 , xΛ2 , ..., xΛn } in decreasing order of magnitude.
geometric series, the choice of lit , and the fact that x0 = 0. Subsets {Λ1 , Λ2 , ..., Λn } with length |Λ|0 ≤ k are cho-
This completes the proof of Lemma 3. sen such that they are all disjointed. Note that kAxk2 =
12

n
P
AΛ i x Λ i . Combing this with the upper bound of (4), Repeat the same steps above, similarly, we have
i=1 2
we have x − xl 2
≤ 2−l kDak k2 + 20ẽ. (78)
n n
X X This completes the proof of Theorem 5.
kAxk2 = AΛi xΛi ≤ kAΛi xΛ1 k2
i=1 2 i=1
n p
X p n
X ACKNOWLEDGMENT
≤ 1 + δk kxΛi k2 < 1 + δK (kxΛ1 k2 + kxΛi k2 ). The authors would like to thank Prof. Xu Ma and the anony-
i=1 i=2 mous reviewers for their insightful comments and constructive
(72) suggestions which have greatly improved the paper.
Combining (71) and (72), we see that
n R EFERENCES
X
kAxk2 = AΛi xΛi [1] E. J. Candes and T. Tao, “Decoding by linear programming,” IEEE
i=1
transactions on information theory, vol. 51, no. 12, pp. 4203–4215,
2
2005.
p xΛc0 1 (73) [2] D. L. Donoho, “Compressed sensing,” IEEE Transactions on information
< √
1 + δK (kxΛ1 k2 +) theory, vol. 52, no. 4, pp. 1289–1306, 2006.
k [3] R. Baraniuk, M. Davenport, R. DeVore, and M. Wakin, “A simple proof
p kxΛ k of the restricted isometry property for random matrices,” Constructive
≤ 1 + δK (kxΛ1 k2 + √1 1 ). Approximation, vol. 28, no. 3, pp. 253–263, 2008.
k [4] P. A. Randall, Sparse recovery via convex optimization. California
Note that Λi contains the indices of the kΛi k0 ≤ k largest Institute of Technology, 2009.
[5] E. J. Candès and B. Recht, “Exact matrix completion via convex
entries in xΛ1 . Thus, maybe xk is a best k-sparse approximate optimization,” Foundations of Computational mathematics, vol. 9, no. 6,
to x, i.e. kxΛ1 k2 ≤ kxk2 . Using this to bound the right side p. 717, 2009.
of the last inequality (73) yields [6] Y. Nesterov and A. Nemirovskii, Interior-point polynomial algorithms
in convex programming. Siam, 1994, vol. 13.
p kxΛ k p kxk [7] C. Soussen, J. Idier, J. Duan, and D. Brie, “Homotopy based algorithms
1 + δK (kxΛ1 k2 + √1 1 ) ≤ 1 + δK (kxk2 + √ 1 ). for l0-regularized least-squares,” hand, vol. 2, 2015.
k k [8] T. Blumensath and M. E. Davies, “Gradient pursuits,” IEEE Transactions
(74) on Signal Processing, vol. 56, no. 6, pp. 2370–2382, 2008.
Combining the above two inequalities yields (32). This [9] J. A. Tropp, A. C. Gilbert, and M. J. Strauss, “Algorithms for simulta-
neous sparse approximation. part i: Greedy pursuit,” Signal Processing,
completes the proof of Theorem 4. vol. 86, no. 3, pp. 572–588, 2006.
[10] S. G. Mallat and Z. Zhang, “Matching pursuits with time-frequency
dictionaries,” IEEE Transactions on signal processing, vol. 41, no. 12,
F. Proof of Theorem 5 pp. 3397–3415, 1993.
[11] J. A. Tropp and A. C. Gilbert, “Signal recovery from random mea-
To complete the proof, we introduce the following Lemma. surements via orthogonal matching pursuit,” IEEE Transactions on
Lemma F.1: Let x be an arbitrary signal in Rn . The information theory, vol. 53, no. 12, pp. 4655–4666, 2007.
measurements with noise perturbation y = Ax + e can also be [12] D. Needell and R. Vershynin, “Signal recovery from incomplete and
inaccurate measurements via regularized orthogonal matching pursuit,”
denoted as y = Axk + ê where IEEE Journal of selected topics in signal processing, vol. 4, no. 2, pp.
310–316, 2010.
kx − xk k1 [13] D. Needell and J. A. Tropp, “Cosamp: Iterative signal recovery from in-
kêk2 ≤ 1.14(kx − xk k2 + √ ) + kek2 . (75)
k complete and inaccurate samples,” Applied and computational harmonic
analysis, vol. 26, no. 3, pp. 301–321, 2009.
Proof: Notice that the term kA(x − Dak )k2 + kek2 ulti- [14] W. Dai and O. Milenkovic, “Subspace pursuit for compressive sens-
mately bounds the recovery error in (31). Combining this with ing signal reconstruction,” IEEE transactions on Information Theory,
vol. 55, no. 5, pp. 2230–2249, 2009.
(32), we have [15] D. L. Donoho, “Sparse components of images and optimal atomic
decompositions,” Constructive Approximation, vol. 17, no. 3, pp. 353–
kA(x − xk )k2 + kek2 382, 2001.
[16] A. C. Gilbert and J. A. Tropp, “Applications of sparse approximation in
p
≤ 1 + δk kx − xk k2 + kek2
(76) communications,” in Information Theory, 2005. ISIT 2005. Proceedings.
p kx − xk k1 International Symposium on, 2005, pp. 1000–1004.
≤ 1 + δk (kx − xk k2 + √ ) + kek2 , [17] B. D. Rao, “Signal processing with the sparseness constraint,” in IEEE
k International Conference on Acoustics, Speech and Signal Processing,
1998, pp. 1861–1864 vol.3.
√ inequality (76) follows from the fact that δk <
where the last [18] T. B. Cilingiroglu, A. Uyar, A. Tuysuzoglu, W. C. Karl, J. Konrad,
1/3 hence 1 + δk ≤ 1.14. B. B. Goldberg, and nl MS, “Dictionary-based image reconstruction for
Proof: Recall that (31) in Lemma 5. Thus, the recovery error superresolution in integrated circuit imaging.” Optics Express, vol. 23,
no. 11, pp. 15 072–87, 2015.
for such a signal estimation can be bounded from above as [19] N. R. Reyes, P. V. Candeas, and F. L. Ferreras, “Wavelet-based approach
for transient modeling with application to parametric audio coding,”
x − xl+1 2
≤ 0.5 x − xl 2
+ kx − Dak k2 Digital Signal Processing, vol. 20, no. 1, pp. 123–132, 2010.
+ 7.5(kA(x − Dak k2 + kek2 ) [20] J. L. Lin, W. L. Hwang, and S. C. Pei, “Video compression based on
orthonormal matching pursuits,” in IEEE International Symposium on
8.55 Circuits and Systems, 2006. ISCAS 2006. Proceedings, 2006, pp. 4 pp.–
≤ 0.5 x − xl + 9.55kx − Dak k2 + √ kx − Dak k1
2 5426.
k
[21] D. Malioutov, M. Cetin, and A. S. Willsky, “A sparse signal recon-
+ 7.5kek2 < 0.5 x − xl 2 + 10ẽ. struction perspective for source localization with sensor arrays,” IEEE
(77) Transactions on Signal Processing, vol. 53, no. 8, pp. 3010–3022, 2005.
13

[22] S. Qian and D. Chen, “Signal representation using adaptive normalized


gaussian functions,” Signal Processing, vol. 36, no. 1, pp. 1–11, 1994.
[23] J. Justesen, “Class of constructive asymptotically good algebraic codes,”
IEEE Transactions on Information Theory, vol. 18, no. 5, pp. 652–656,
2003.
[24] S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decomposition by
basis pursuit. siam j sci comput,” Siam Journal on Scientific Computing,
vol. 20, no. 1, pp. 33–61, 1998.
[25] J. L. Starck, E. J. Cands, and D. L. Donoho, “The curvelet transform for
image denoising,” IEEE Transactions on Image Processing A Publica-
tion of the IEEE Signal Processing Society, vol. 11, no. 6, pp. 670–84,
2002.
[26] B. A. Olshausen, “Learning real and complex overcomplete represen-
tations from the statistics of natural images,” Proc Spie, vol. 7446, pp.
74 460S–74 460S–11, 2009.
[27] S. Nirmala and K. R. Chetan, A New Curvelet Based Blind Semi-
fragile Watermarking Scheme for Authentication and Tamper Detection
of Digital Images. Springer India, 2016.
[28] M. Aharon, M. Elad, and A. Bruckstein, “rmk-svd: An algorithm for
designing overcomplete dictionaries for sparse representation,” IEEE
Transactions on Signal Processing, vol. 54, no. 11, pp. 4311–4322, 2006.
[29] I. Ram, I. Cohen, and M. Elad, “Facial image compression using patch-
ordering-based adaptive wavelet transform,” IEEE Signal Processing
Letters, vol. 21, no. 10, pp. 1270–1274, 2014.
[30] G. Shao, Y. Wu, A. Yong, X. Liu, and T. Guo, “Fingerprint compression
based on sparse representation,” IEEE Trans Image Process, vol. 23,
no. 2, pp. 489–501, 2014.
[31] J. Hou, L. P. Chau, Y. He, and N. Magnenat-Thalmann, “Expression-
invariant and sparse representation for mesh-based compression for 3-d
face models,” in Visual Communications and Image Processing, 2014,
pp. 1–6.
[32] CHEN, Yuan, ZHANG, Rong, YIN, and Dong, “Multi-polarimetric sar
image compression based on sparse representation,” in International
Conference on Audio, Language and Image Processing, 2012, pp. 705–
709.
[33] rem lk and B. U. Treyin, “Lossy compression of hyperspectral images
using online learning based sparse coding,” in International Workshop
on Computational Intelligence for Multimedia Understanding, 2015, pp.
1–5.
[34] R. Giryes and D. Needell, “Greedy signal space methods for incoherence
and beyond,” Applied and Computational Harmonic Analysis, vol. 39,
no. 1, pp. 1–20, 2015.
[35] X. GU and S. TU, “On practical approximate projection schemes in
signal space methods,” 2016.
[36] M. A. Davenport, D. Needell, and M. B. Wakin, “Signal space cosamp
for sparse recovery with redundant dictionaries,” IEEE Transactions on
Information Theory, vol. 59, no. 10, pp. 6820–6829, 2013.
[37] R. Zhang and S. Li, “Optimal d-rip bounds in compressed sensing,” Acta
Mathematica Sinica, English Series, vol. 31, no. 5, pp. 755–766, 2015.

You might also like