Steffens 2018
Steffens 2018
AbstractâParameter estimation from multiple measurement structure of the signal admits a unique solution to the underde-
vectors (MMVs) is a fundamental problem in many signal pro- termined system. In the signal processing context, this implies
cessing applications, e.g., spectral analysis and direction-of-arrival that far fewer samples than postulated by the Shannon-Nyquist
estimation. Recently, this problem has been addressed using prior
information in form of a jointly sparse signal structure. A promi- sampling theorem for bandlimited signals are required for per-
nent approach for exploiting joint sparsity considers mixed-norm fect signal reconstruction [10]. While SSR under the classical 0
minimization in which, however, the problem size grows with the formulation constitutes a combinatorial and NP-complete opti-
number of measurements and the desired resolution, respectively. mization problem, several methods exist to approximately solve
In this work, we derive an equivalent, compact reformulation of the the SSR problem. Most prominent methods are based on convex
2 ,1 mixed-norm minimization problem that provides new insights
on the relation between different existing approaches for jointly relaxation in terms of 1 norm minimization, which makes the
sparse signal reconstruction. The reformulation builds upon a com- SSR problem computationally tractable while providing suffi-
pact parameterization, which models the row-norms of the sparse cient conditions for exact recovery [2]â[9], or greedy methods,
signal representation as parameters of interest, resulting in a sig- such as OMP [11], [12] and CoSaMP [13], which have low
nificant reduction of the MMV problem size. Given the sparse vec- computational cost but provide reduced recovery guarantees.
tor of row-norms, the jointly sparse signal can be computed from
the MMVs in closed form. For the special case of uniform linear In the context of parameter estimation, e.g., in Direction-Of-
sampling, we present an extension of the compact formulation for Arrival (DOA) estimation, the SSR problem has been extended
gridless parameter estimation by means of semidefinite program- to an infinite-dimensional vector space by means of total varia-
ming. Furthermore, we prove in this case the exact equivalence tion norm and atomic norm minimization [14]â[19], leading to
between our compact problem formulation and the atomic-norm gridless parameter estimation methods.
minimization. Additionally, for the case of irregular sampling or a
large number of samples, we present a low complexity, grid-based Besides the aforementioned SMV problem, many practi-
implementation based on the coordinate descent method. cal applications deal with the problem of finding a jointly
sparse signal representation from Multiple Measurement Vec-
Index TermsâMultiple measurement vectors, joint sparsity,
mixed-norm minimization, gridless estimation.
tors (MMVs), also referred to as the multiple snapshot estima-
tion problem. Similar to the SMV case, approximate methods
I. INTRODUCTION for the MMV-based SSR problem include convex relaxation
by means of mixed-norm minimization [20]â[23], and greedy
PARSE Signal Reconstruction (SSR) techniques have
S gained a considerable research interest over the last decades
[2]â[9]. Traditionally, SSR considers the problem of recon-
methods [24], [25]. Recovery guarantees for the MMV case
have been established in [26]â[29]. An extension to the infinite-
dimensional vector space for MMV-based SSR, using atomic
structing a high-dimensional sparse signal vector from a low-
norm minimization, has been proposed in [30]â[32].
dimensional Single Measurement Vector (SMV), which is char-
Apart from SSR, MMV-based parameter estimation is a clas-
acterized by an underdetermined system of linear equations. It
sical problem in array signal processing [33], [34]. Prominent
has been shown that exploiting prior knowledge on the sparsity
applications in array processing include beamforming and DOA
estimation. Beamforming considers the problem of signal recon-
Manuscript received May 8, 2017; revised November 21, 2017; accepted struction in the presence of noise and interference while DOA
December 11, 2017. Date of publication January 1, 2018; date of current ver- estimation falls within the concept of parameter estimation and
sion February 1, 2018. The associate editor coordinating the review of this
manuscript and approving it for publication was Dr. Gonzalo Mateos. This is addressed, e.g., by the subspace-based MUSIC method [35].
work was supported by the EXPRESS project within the DFG priority program The MUSIC method has been shown to perform asymptoti-
CoSIP (DFG-SPP 1798). This paper was presented in part at the 42nd IEEE cally optimal [36] and offers the super-resolution property at
International Conference on Acoustics, Speech and Signal Processing, New
Orleans, LA, USA, March 2017. (Corresponding author: Christian Steffens.) tractable computational cost. On the other hand, in the non-
C. Steffens and M. Pesavento are with the Communication Systems asymptotic case of low number of MMVs or correlated source
Group, Technische UniversitaĚt Darmstadt, Darmstadt 64283, Germany (e-mail: signals, the performance of subspace-based estimation methods
[email protected]; [email protected]).
M. E. Pfetsch is with the Discrete Optimization Group, Technische Uni- can drastically deteriorate such that SSR techniques provide an
versitaĚt Darmstadt, Darmstadt 64293, Germany (e-mail: pfetsch@mathematik. attractive alternative for these scenarios [37]â[39]. In fact, due
tu-darmstadt.de). to similar objectives in SSR and array signal processing, strong
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. links between the two fields of research have been established
Digital Object Identifier 10.1109/TSP.2017.2788431 in literature. The OMP has an array processing equivalent in
1053-587X Š 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications standards/publications/rights/index.html for more information.
1484 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 66, NO. 6, MARCH 15, 2018
the CLEAN method [40] for source localization in radio as- of our proposed reformulation as compared to both equivalent
tronomy, i.e., both methods rely on the same greedy estima- formulations, the classical 2,1 mixed-norm [20], [37] and the
tion approach. In [25], [41] the authors present the FOCUSS atomic norm [30]â[32] problem formulations.
method, which provides sparse estimates by iterative weighted In summary, our main contributions are the following:
norm minimization, with application to DOA estimation. SSR r We derive an equivalent, compact reformulation of the
based on an 2,0 mixed-norm approximation has been consid- classical 2,1 mixed-norm minimization problem [20],
ered in [38], while a convex relaxation approach based on the [37], named SPARROW, with significantly reduced com-
2,1 mixed-norm has been proposed in [37]. DOA estimation putational cost.
based on second-order signal statistics has been addressed in r We prove that a gridless implementation of the SPAR-
[42], [43], where a sparse covariance matrix representation is ROW formulation is equivalent to the atomic norm min-
exploited by application of a sparsity prior on the source co- imization problem [30]â[32], while having significantly
variance matrix, leading to an SMV-like sparse minimization reduced computational cost.
problem. In [44]â[46] the authors propose the SPICE method, r We provide a low complexity implementation of the com-
which is based on weighted covariance matching and constitutes pact SPARROW formulation, based on the coordinate de-
a sparse estimation problem which does not require the assump- scent method, for application in large and irregular sam-
tion of a sparsity prior. Links between SPICE and SSR formula- pling scenarios, which shows improved convergence as
tions have been established in [32], [45]â[48], which show that compared to the non-compact case.
SPICE can be reformulated as an 2,1 mixed-norm minimization r We extend the available results on theoretical links between
problem. the 2,1 mixed-norm minimization problem and the SPICE
In this paper we consider jointly sparse signal reconstruction method [44]â[46].
from MMVs by means of the classical 2,1 mixed-norm mini- The paper is organized as follows: In Section II we present
mization problem, with application to DOA estimation in array the sensor array signal model. A short review of the classical
signal processing. Compared to recently presented sparse meth- 2,1 mixed-norm minimization problem and the atomic norm
ods such as SPICE [44]â[46] and atomic norm minimization minimization problem is provided in Section III, before the
[30]â[32], the classical 2,1 formulation has the general short- equivalent, compact SPARROW formulation is introduced in
coming that its problem size grows with the number of measure- Section IV. A low complexity implementation of the SPAR-
ments and the resolution requirement, respectively. Approaches ROW formulation is derived in Section V. Section VI provides
to deal with the aforementioned problems have been presented, a theoretical comparison of the SPARROW formulation and
e.g., in [37], [49]. While the classical 2,1 mixed-norm mini- the SPICE method. Simulation results for comparison of the
mization problem has a large number of variables in the jointly computational cost of the various formulations are presented in
sparse signal representation, in this paper we derive an equiv- Section VII. Conclusions are provided in Section VIII.
alent problem reformulation based on a compact parameteriza- Notation: Boldface uppercase letters X denote matrices,
tion in which the optimization parameters represent the row- boldface lowercase letters x denote column vectors, and reg-
norms of the signal representation, rather than the signal matrix ular letters x, N denote scalars, with j denoting the imaginary
itself. We refer to this formulation as SPARse ROW-norm recon- unit. Superscripts X T and X H denote transpose and conjugate
struction (SPARROW). Given the sparse signal row-norms, the transpose of a matrix X, respectively. The sets of diagonal and
jointly sparse signal matrix is reconstructed from the MMVs in nonnegative diagonal matrices are denoted as D and D+ , re-
closed-form. We point out that support recovery is determined spectively. We write [X]m ,n to indicate the element in the mth
by the sparse vector of row-norms and only relies on the sample row and nth column of matrix X. The statistical expectation
covariance matrix instead of the MMVs themselves. In this sense of a random variable x is denoted as E{x}, and the trace of a
we achieve a concentration of the optimization variables as well matrix X is referred to as Tr(X). The Frobenius norm and the
as the measurements, leading to a significantly reduced prob- p,q mixed-norm of a matrix X are referred to as X F and
lem size in the case of a large number of MMVs. Using standard Xp,q , respectively, while the p norm of a vector x is denoted
concepts of semidefinite programming, we derive a gridless im- as xp . Toep(u) describes a Hermitian Toeplitz matrix with u
plementation of our SPARROW formulation for application in as its first column and diag(x) denotes a diagonal matrix with
uniform sampling scenarios and prove its equivalence to atomic the elements in x on its main diagonal.
norm minimization. Furthermore, we present a low complex-
ity implementation of our grid-based SPARROW formulation
II. SIGNAL MODEL
based on the coordinate descent method which is applicable to
large and irregular sampling scenarios. To put our new problem Consider a linear array of M omnidirectional sensors, as
formulation in context with other existing methods, we com- depicted in Fig. 1. Further, assume a set of L narrowband far-
pare it to the SPICE method and our results extend the existing field sources in angular directions θ1 , . . . , θL, summarized as
links between SPICE and 2,1 mixed-norm minimization. We θ = [θ1 , . . . , θL ]T . The corresponding spatial frequencies are
conclude our presentation by a short numerical analysis of the defined as
computational cost of our proposed SPARROW formulation
which shows a significant reduction in the computational time Îźl = cos θl â [â1, 1), (1)
STEFFENS et al.: COMPACT FORMULATION FOR THE 2 , 1 MIXED-NORM MINIMIZATION PROBLEM 1485
Fig. 1. Exemplary setup for a linear array of M = 6 sensors and L = 3 source Fig. 2. Signal model and sparse representation (neglecting additive noise and
signals. basis mismatch) for M = 6 sensors, L = 3 source signals and K = 12 grid
points.
the resulting vector x( p ) . The inner p norm provides a nonlin- B. Atomic Norm Minimization
ear coupling among the elements in a row, leading to the desired The concept of Atomic Norm Minimization (ANM) has been
row-sparse structure of the signal matrix X. Ideally, considering introduced in [16] as a unifying framework for different types
the representation in (5) with the row sparse structure in (7), we of sparse recovery methods, such as 1 norm minimization
desire a problem formulation containing an p,0 pseudo-norm, for sparse vector reconstruction or nuclear norm minimization
leading, however, to an NP-complete problem, such that convex for low-rank matrix completion. In [17]â[19] ANM was intro-
relaxation in form of p,1 mixed-norm is considered in prac- duced for gridless line spectral estimation from SMVs in uni-
tice to obtain computationally tractable problems. In the SMV form linear arrays (ULAs). The extension of ANM to MMVs
case, i.e., N = 1, the p,1 mixed-norm reduces to the 1 norm, under this setup was studied in [30]â[32], which will be re-
such that p,1 mixed-norm minimization can be considered as vised in the following. Consider L source signals with spatial
a generalization of the classical 1 norm minimization problem frequencies Îź1 , . . . , ÎźL, impinging on a ULA with sensor po-
[2], [3] to the MMV case with N > 1. Common choices of sitions Ďm = m â 1, for m = 1, . . . , M . The noise-free mea-
mixed-norms are the 2,1 norm [20], [37] and the â,1 norm surement
[21], [22]. Similar to the SMV case, recovery guarantees for matrix obtained at the array output is modeled as
Y 0 = Ll=1 a(Îźl )Ď Tl , where the samples of the lth source sig-
the MMV-based joint SSR problem have been derived [26]â nal are contained in the N Ă 1 vector Ď l . In the ANM frame-
[28], providing conditions for the noiseless case under which work [30]â[32], the measurement matrix Y 0 is considered as a
the sparse signal matrix X can be perfectly reconstructed. convex combination of atoms a(ν)bH with b â C N , b2 = 1
Given a row-sparse minimizer XĚ for (8), the DOA estimation and ν â [â1, 1), i.e., in contrast to the previous section the fre-
problem reduces to identifying the union support set, i.e., the quencies ν are continuous and not restricted to lie on a grid. The
indices of the non-zero rows, from which the set of estimated atomic norm of Y 0 is defined as
spatial frequencies can be obtained as
Y 0 A = inf ck : Y 0 = ck a(νk )bk , ck ⼠0 .
H
{c k ,bk ,
{ÎźĚl }LĚl=1 = {νk | xĚk p > 0, k = 1, . . . , K } (11) νk } k k
(12)
For the special case of ULAs, it was shown in [16]â[19], [30]â
where xĚk corresponds to the kth row of the estimated signal [32] that the atomic norm in (12) can equivalently be computed
matrix XĚ = [xĚ1 , . . . , xĚK ]T and LĚ denotes the number of non- by the semidefinite program (SDP)
zero rows in XĚ, i.e., the estimated model order.
1 1
One major drawback of the mixed-norm minimization prob- Y 0 A = inf Tr V N + Tr Toep(v) (13a)
v,V N 2 2M
lem in (8) lies in its computational cost, which is determined
by the size of the K Ă N source signal matrix X. A large VN YH
0
number of grid points K is desired to improve the frequency s.t. 0. (13b)
Y0 Toep(v)
resolution, while a large number of measurement vectors N is
desired to improve the estimation performance. However, the Given a solution to problem (13) the reconstruction of the spa-
choice of too large values K and N makes the problem compu- tial frequencies νk and magnitudes ck , for k = 1, . . . , K, is
tationally intractable. To reduce the computational cost in the performed by means of the Vandermonde decomposition: For
MMV problem it was suggested in [37] to reduce the dimen- ULAs the M à K matrix A = [a(ν1 ), . . . , a(νK )] has a Van-
sion of the M à N measurement matrix Y by matching only dermonde structure such that the product a(νk )aH (νk ) exhibits
the signal subspace in form of an M Ă L matrix Y SV , leading a Toeplitz structure and
to the prominent 1 -SVD method. A drawback of the 1 -SVD
K
method is that it requires knowledge of the number of source ck a(νk )aH (νk ) = Toep(v), (14)
signals and that the estimation performance may deteriorate in k =1
the case of correlated source signals. In [49], [52] a related di-
where Toep(v) denotes a Hermitian Toeplitz matrix with v as
mensionality reduction approach was proposed. Instead of only
its first column. As discussed in [17], the Caratheodory the-
matching the signal subspace, the authors propose to match the
orem [53]â[55] states that any Toeplitz matrix Toep(v) of
signal and noise subspace in form of an M Ă M matrix Y RD .
rank K ⤠M can be represented by a Vandermonde decom-
It was shown in [52] that matching the matrix Y RD results in
position according to (14) for any K ⤠M distinct frequencies
the same estimate of the sparse spatial spectrum as matching
ν1 , . . . , νK and corresponding magnitudes c1 , . . . , cK > 0. In
the original measurement matrix Y . In case of a large number
practice, the Vandermonde decomposition for a Toeplitz matrix
of measurement vectors N > M , both dimensionality reduc-
Toep(v) according to (14) can be obtained by first recovering
tion approaches result in reduced computational cost since the
the frequencies νĚk , e.g., by Pronyâs method [56], the matrix
dimension of the signal matrix X is equally reduced.
pencil approach [57] or linear prediction methods [58], where
To achieve high frequency resolution it was further suggested
the frequency recovery is performed in a gridless fashion. The
in [37] to perform an adaptive grid refinement. In the special
corresponding signal magnitudes in c = [c1 , . . . , cK ]T can be
case of uniform linear arrays the 2,1 mixed-norm minimization
reconstructed by solving the linear system
problem can equivalently by addressed in a gridless fashion by
the atomic norm framework, discussed in the following section. A(νĚ) c = v, (15)
STEFFENS et al.: COMPACT FORMULATION FOR THE 2 , 1 MIXED-NORM MINIMIZATION PROBLEM 1487
i.e., by exploiting that [a(ν)]1 = 1, for all ν â [â1, 1), and In addition to (20), we observe that the matrix SĚ = diag(sĚ1 ,
considering the first column in the representation (14). . . . , sĚK ) contains the row-norms of the sparse signal matrix
As proposed in [30]â[32], given a noise-corrupted measure- XĚ = [xĚ1 , . . . , xĚK ]T on its diagonal according to
ment matrix Y as defined in (2), gridless joint sparse recovery
1
from MMVs can be performed by using (12) in the form of sĚk = â xĚk 2 , (21)
N
1 â
min Y â Y 0 2F + Îť N Y 0 A (16) for k = 1, . . . , K, such that the union support of XĚ is equiva-
Y0 2
lently represented by the support of the sparse vector of row-
or, equivalently, by using the SDP formulation in (13), as norms [sĚ1 , . . . , sĚK ]. We will refer to (19) as SPARse ROW-norm
â reconstruction (SPARROW). In this regard, we emphasize that
1 Îť N 1
min Y â Y 0 F + 2
Tr V N + Tr Toep(v) SĚ should not be mistaken for a sparse representation of the
v,V N , 2 2 M
Y0 source covariance matrix, i.e., SĚ = E{XĚ XĚ H }/N . While the
(17a) mixed-norm minimization problem in (18) has N K complex
variables in X, the SPARROW problem in (19) provides a re-
VN YH0
s.t. 0. (17b) duction to only K nonnegative variables in the diagonal matrix
Y0 Toep(v) S. However, the union support of XĚ is similarly provided by
Similar as for the 2,1 mixed-norm minimization problem, the SĚ. Moreover, the SPARROW problem in (19) only relies on
ANM problem suffers from a large number of optimization the sample covariance matrix RĚ instead of the MMVs in Y
parameters in the matrix Y 0 in the case of a large number of themselves, leading to a reduction in problem size, especially
MMVs N such that dimensionality reduction techniques similar in the case of large number of MMVs N . Interestingly, this
to those discussed in Section III-A have been proposed to reduce also indicates that the union support of the signal matrix XĚ is
the computational cost [49]. Additionally, the dimensions of the fully encoded in the sample covariance RĚ, rather than the in-
semidefinite constraint (17b) grow with the number of sensors stantaneous MMVs in Y , as may be concluded from the 2,1
M and MMVs N and the problem becomes intractable for large formulation in (18). Similar observations were made in [52] in
values of M and N . An implementation of the SDP based on the the context of dimensionality reduction. As seen from (20), the
alternating direction method of multipliers (ADMM) has been instantaneous MMVs in Y are only required for the signal re-
proposed in [18], [59] to reduce the problem of computational construction, which, in the context of array signal processing,
cost. However, for large problem sizes it was proposed in [60] can be interpreted as a form of beamforming [34], where the
to rather use the grid-based formulations such as the 2,1 mixed- row-sparse structure in XĚ is induced by premultiplication with
norm minimization (8) problem which can be solved efficiently, the sparse diagonal matrix SĚ. In contrast to the dimensional-
rather than the SDP formulation in (17). ity reduction techniques discussed in Section III-A, the pro-
posed SPARROW formulation in (19) admits a reduced number
of variables while providing the same solution as the original
IV. SPARROW: A REFORMULATION OF THE
2,1 mixed-norm minimization problem in (18). In comparison,
2,1 MIXED-NORM MINIMIZATION PROBLEM
the 1 -SVD method in [37] requires a K Ă L matrix variable
As discussed in Sections I and III, the MMV-based 2,1 mixed- X SV and thus has significantly reduced number of parameters
norm minimization problem is a well investigated problem with in case of small number of sources L, but suffers from de-
many fields of application. In this context, one of the main results graded estimation performance in case of incorrect subspace
of this manuscript is given by the following, novel problem estimation. Conversely, the dimensionality reduction technique
reformulation: in [49], [52] provides the same estimation performance as the
Theorem 1: The row-sparsity inducing 2,1 mixed-norm original 2,1 mixed-norm minimization problem in (18), but
minimization problem requires a K Ă M matrix variable X RD , i.e., it suffers from
1 â increased number of parameters for large number of sensors M ,
min AX â Y 2F + Îť N X2,1 (18) as compared to the SPARROW and 1 -SVD methods.
X 2
To show convexity of the SPARROW formulation (19) and for
is equivalent to the convex problem implementation with standard convex solvers, such as MOSEK
[61], consider the following corollaries [62]:
min Tr (ASAH + ÎťI M )â1 RĚ + Tr(S), (19)
SâD+ Corollary 1: The SPARROW problem in (19) is equivalent
to the semidefinite program (SDP)
with RĚ = Y Y H /N denoting the sample covariance matrix and
D+ describing the set of nonnegative diagonal matrices, in the 1
min Tr(U N ) + Tr(S) (22a)
sense that minimizers XĚ and SĚ for problems (18) and (19), S,U N N
respectively, are related by
UN YH
XĚ = SĚA (ASĚA + ÎťI M ) Y .
H H â1
(20) s.t. 0 (22b)
Y ASAH + ÎťI M
A proof of the equivalence is provided in Appendix A, while S â D+ (22c)
a proof of the convexity of (19) is provided in Appendix C by
showing positive semidefiniteness of the Hessian matrix. where U N is a Hermitian matrix of size N Ă N .
1488 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 66, NO. 6, MARCH 15, 2018
To see the equivalence of the two problems, note that in (22) while the frequencies {ÎźĚl }LĚl and corresponding magnitudes
ASAH + ÎťI M 0 is positive definite, since S 0 and Îť > 0.
{sĚl }LĚl can be estimated by Vandermonde decomposition ac-
Further consider the Schur complement of the constraint (22b),
[62]: cording to (14). With the frequencies in {ÎźĚl }LĚl and signal mag-
nitudes in {sĚl }LĚl , the corresponding signal matrix XĚ can be
UN Y H (ASAH + ÎťI M )â1 Y , (23) reconstructed by application of (20).
We remark that
unique Vandermonde decomposition requires
which implies that LĚ = rank Toep(uĚ) < M . The rank LĚ can be interpreted
1 1 as the counterpart of the number of non-zero elements in the
Tr(U N ) ⼠Tr(Y H (ASAH + ÎťI M )â1 Y ) minimizer SĚ in the grid-based problems (22) and (25). Similarly
N N
as the regularization parameter Îť determines the number of non-
= Tr((ASAH + ÎťI M )â1 RĚ). (24) zero elements, i.e., the sparsity level of SĚ, there always exists a
For any optimal point SĚ of (19) we can construct a feasible value Îť which yields a minimizer uĚ of the gridless formulations
point of (22) with the same objective function value by choos- (29) and (30) which fulfills LĚ = rank Toep(uĚ) < M such that
ing U N = Y H (ASĚAH + ÎťI M )â1 Y . Conversely, any optimal a unique Vandermonde decomposition is obtained. We provide
solution pair UĚ N , SĚ of (22) is also feasible for (19). a description for the appropriate choice of the regularization
Corollary 2: The SPARROW formulation in (19) admits the parameter Îť in Section VII.
equivalent problem formulation For using standard convex solvers we follow the ideas of
Corollary 1 to reformulate (26) as the SDP
min Tr(U M RĚ) + Tr(S) (25a) 1 1
S,U M
min Tr U N + Tr Toep(u) (29a)
u,U N N M
UM IM
s.t. 0 (25b) UN YH
IM ASAH + ÎťI M s.t. 0 (29b)
Y Toep(u) + ÎťI M
S â D+ (25c)
Toep(u) 0. (29c)
where U M is a Hermitian matrix of size M Ă M .
The proof of Corollary 2 follows the same line of arguments Alternatively, using the approach of Corollary 2, we define the
as in the proof of Corollary 1. In contrast to the constraint (22b), gridless estimation problem
the dimension of the semidefinite constraint (25b) is indepen- 1
dent of the number of MMVs N . It follows that either problem min Tr U M RĚ + Tr Toep(u) (30a)
u,U M M
formulation (22) or (25) can be selected to solve the SPARROW
problem in (19), depending on the number of MMVs N and the UM IM
s.t. 0 (30b)
resulting dimension of the semidefinite constraint, i.e., (22) is I M Toep(u) + ÎťI M
preferable for N ⤠M and (25) is preferable otherwise. We re-
Toep(u) 0. (30c)
mark that the SDP implementations in [32] have been derived
using similar steps, i.e., employing the Schur complement to Comparing the GL-SPARROW formulation (29) and the
obtain linear matrix inequality constraints according to [62]. ANM problem (17) we observe a similar structure in the objec-
In the case of ULAs the steering matrix A has a Vandermonde tive functions and semidefinite constraints. In fact, both prob-
structure and the matrix product ASAH = Toep(u) forms a lems are equivalent as given by the following theorem:
Toeplitz matrix, as discussed in Section III-B. Based on the Theorem 2: The atomic norm minimization problem (16)
uniqueness of the Vandermonde decomposition as discussed for and the corresponding SDP implementation (17) with auxiliary
(14), we rewrite problem (19) as the gridless (GL-) SPARROW variable v, is equivalent to the gridless SPARROW formulation
formulation (29) in the sense that the corresponding minimizers are related
1 by
min Tr (Toep(u) + ÎťI M )â1 R + Tr Toep(u) (26a) â
u M uĚ = vĚ/ N . (31)
s.t. Toep(u) 0, (26b)
A proof of Theorem 2 is given in the Appendix B. For both
where we additionally make use of the identity problem formulations, GL-SPARROW (29) and ANM (17), the
spatial frequencies ν are encoded in the vectors uĚ and vĚ, as
1 1 found by Vandermonde decomposition (14), such that both for-
Tr(S) = Tr(ASAH ) = Tr Toep(u) , (27)
M M mulations provide the same frequency estimates.
with the factor 1/M resulting from a(ν)22 = M , for all ν â However, from a computational viewpoint, in contrast to the
[â1, 1). Given a minimizer uĚ of problem (26), the number of GL-SPARROW problem in (29), the ANM problem in (17)
sources, i.e., the model order, can be directly estimated as has additional M N complex variables in the matrix Y 0 , which
need to be matched to the MMV matrix Y by an additional
LĚ = rank Toep(uĚ) , (28) quadratic term in the objective function. We remark that the
STEFFENS et al.: COMPACT FORMULATION FOR THE 2 , 1 MIXED-NORM MINIMIZATION PROBLEM 1489
(32b), it can easily be verified that the optimal stepsize must method in Algorithm 1. Then, every limit point of {S (Ď ) } is a
(Ď ) (Ď )
fulfill dËk ⼠âsk > dË0 , i.e., it must be located on the right stationary point.
The assumptions of the uniqueness of minimum and mono-
hand side of the pole dË0 , and the optimal stepsize according to
tonic nonincrease of f (S) in Proposition 1 are satisfied for our
(32) is computed as
proposed approach because f (S) is strictly convex in each com-
â â
a H U â1 RĚU â1 a â 1 ponent when all other components are held fixed, as discussed
k k ,Ď k ,Ď k
dËk = max â , âsk â .
(Ď ) (Ď )
â1 (41) in Appendix C.
aH
k U a
k ,Ď k
matching function according to from those used in our proof of Theorem 1. Furthermore, there
2 are some significant differences between the SR-LASSO for-
min R0 (RĚ â R0 )F : (46)
â1/2
P âD+ ,
mulation (52) and the standard mixed-norm formulation (18)
âĽ0 considered here. The latter reduces to the popular standard
LASSO [2] in the special case of a single measurement vec-
= min Tr Râ1
0 RĚ
2
+ Tr R0 â 2Tr RĚ : (46) , tor. As compared to the SR-LASSO (52), the standard LASSO
P âD+ ,
âĽ0 has a squared data fitting term, such that for additive white
(48) Gaussian noise the standard LASSO admits an interpretation as
where sparsity in P is induced a Bayesian estimator with Laplacian priors [2], [70]. Equiva-
in the objective of (48) in form
of the trace penalty term Tr R0 as can be observed from the lence of the standard LASSO and the SR-LASSO only holds in
following identity: the noise-free case, such that in this case the SPICE formula-
tion in (48) is equivalent to standard 1 norm minimization. In
K K
contrast to that, the SPARROW formulation is equivalent to the
Tr R0 = M + ak 2 ¡ pk = M
2
+ pk . (49)
standard mixed-norm minimization problem in the general and
k =1 k =1
practically relevant case of noisy measurements.
The oversampled case, with N ⼠M where the sample co- Another major difference of the mixed-norm minimization
variance matrix RĚ is non-singular, is treated by the minimization problem in (18) and the SR-LASSO formulation in (52) lies
of the weighted covariance matching function according to in the absence of the regularization parameter Îť in the latter
2
â1/2 approach. The mixed-norm problem (18) admits to obtain a so-
min R0 RĚ â R0 RĚâ1/2 F : (46)
P âD+ , lution of any desired sparsity level by tuning the regularization
âĽ0 parameter Îť, e.g., by exploiting a-priori knowledge or by apply-
ing blind techniques such as the cross validation approach of [2].
= min Tr Râ1
0 RĚ + Tr R 0 RĚ â1
â 2M : (46) , (50)
P âD+ , The SR-LASSO in (52) does not have such a regularization pa-
âĽ0
rameter and thus provides less flexibility in the solution. On the
where sparsity in P is induced by summation of its diagonal other hand, since the selection of the regularization parameter
elements with data dependent nonnegative weights according to can be quite challenging in practice, this makes the SR-LASSO,
and correspondingly the SPICE method, easily applicable in
K
Tr R0 RĚâ1 = Tr RĚâ1 + aH â1
k RĚ ak ¡ pk . (51) practical scenarios [44]â[46], [71].
k =1
A gridless extension of SPICE to the GridLess Spice (GLS)
method for ULAs was proposed in [32], which relies on an SDP
We remark that our proposed SPARROW formulation in (19)
formulation of the SPICE problems (48) and (50), and Vander-
exhibits similarities with both SPICE formulations (48) and
monde decomposition of Toeplitz matrices, similar to the ANM
(50). While the SPARROW formulation shares the uniformly
and SPARROW problems discussed in Sections III-B and IV.
weighted summation of its variables in Tr(S) with the SPICE
In [32], [72] it has been shown that GLS can be interpreted
formulation
in (48), it shares the structure of the data fitting func-
as special versions of noise-free ANM (13). In contrast to the
tion Tr (ASAH + ÎťI M )â1 RĚ with the SPICE formulation in
results in [32], [72], our results of equivalence between gridless
(50). There is, however, a fundamental difference between the
SPARROW and ANM in Section III-B hold in the more general
SPARROW formulation and the SPICE formulations in the fact
case with an additional data matching term in the ANM formu-
that the variables in S correspond to the normalized row-norms
lation to account for noise-corrupted measurements according
of the signal matrix, i.e., sĚk = â1N xĚk 2 , for k = 1, . . . , K, as
to (17).
seen from (21), while the variables in P correspond to the sig-
nal powers, i.e., pĚk = â1N E{xĚk 22 }, for k = 1, . . . , K, as seen
VII. NUMERICAL EXPERIMENTS
from (45) and (47).
Related links between SPICE and 2,1 mixed-norm minimiza- The parameter estimation performance of the 2,1 mixed-
tion have been presented, e.g., in [47], [48], where it has been norm minimization, ANM and SPICE has been numerically
shown that for the case of a single measurement vector y the investigated in various publications, e.g., [30]â[32], [37], [38],
SPICE problem in (48) is equivalent to the square-root LASSO [44]â[46]. Instead, we provide a comparison of the computation
(SR-LASSO) [69] time for the equivalent approaches discussed in this paper.
Regarding the choice of the regularization parameter in the
min Ax â y2 + x1 (52) SPARROW formulation, we follow the heuristic approach of
x
selecting
in the sense that the corresponding minimizers are related by
|xĚk | y2 Îť = Ď 2 M log M , (54)
xĚ = PĚ AH (APĚ AH + ËI)â1 y and pĚk = â . (53)
M as suggested for the single measurement vector problem in [18],
Similarly, it was shown in [45] that the SPICE formulation in which has provided good estimation performance for the sce-
(50) is equivalent to a weighted SR-LASSO formulation. We narios investigated in this manuscript.
point out that the line of arguments used in [45], [47], [48] All simulations are performed in Matlab on a computer with
to prove the above mentioned equivalences is rather different an Intel Core i7-4770 CPU @ 3.40 GHz Ă 8 and 16 GByte
1492 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 66, NO. 6, MARCH 15, 2018
Fig. 3. Average CPU time for equivalent methods under with varying number Fig. 4. Average CPU time for equivalent methods under with varying number
of measurement vectors. of sensors.
RAM. For evaluation of the SDP reformulations of the SPAR- and (30) clearly outperform their grid-based counterparts (22)
ROW problem we employ the multi-purpose solver MOSEK and (25). Comparing the coordinate descent implementations of
[61] with the CVX MATLAB interface [73], [74]. For eval- the 2,1 mixed-norm minimization problem and the SPARROW
uation of the coordinate descent (CD) method proposed in formulation in Fig. 3, it can be seen that the 2,1 CD method
Section VII-C, we employ a C/C++ implementation of the CD has the highest computation time among all methods under con-
methods for the SPARROW formulation and the 2,1 mixed- sideration for all MMV numbers N ⤠40, and the computation
norm minimization problem [66], respectively. To reduce the time increases with the number of MMVs, while the compu-
computational cost in both CD methods, zero coordinates tation time of the SPARROW CD implementation is slightly
(Ď ) (Ď )
sk 0 = 0 and xk 0 = 0 are excluded from computation in fu- lower than the grid-based MOSEK implementation and almost
ture iterations Ď > Ď0 . In all experiments, both CD methods are independent of the number of MMVs.
initialized with zero matrices, i.e., S (0) = 0 and X (0) = 0. We The experiment shows that all the methods employing the
assume that convergence is achieved for both CD methods if the raw measurements in Y , i.e., 2,1 mixed-norm minimization
relative change of the objective function value f (Ď ) in iteration (18), the 2,1 CD method, the SPARROW formulation (22),
Ď fulfills |f (Ď ) â f (Ď â1) |/f (Ď ) ⤠10â12 . the ANM formulation (17) and the GL-SPARROW formulation
(29), suffer from increased computation time in the case of large
A. Number of Measurement Vectors number of measurement vectors N , demonstrating the necessity
of dimensionality reduction techniques, as will be investigated
We consider a scenario with L = 3 independent complex in the following experiment.
Gaussian sources with static spatial frequencies Îź1 = â0.1,
Îź2 = 0.35 and Îź3 = 0.5 and a ULA with M = 10 sensors. The
signal-to-noise ratio (SNR) is fixed at SNR = 10 dB while the B. Number of Sensors
number of MMVs N is varied. Fig. 3 shows the average CPU We keep the scenario from the previous section with L = 3
time of 2,1 mixed-norm minimization (18), the SPARROW for- source signals and fix the number of MMVs as N = 50 while
mulations (22) and (25), atomic norm minimization (ANM) (17) varying the number of sensors M in the ULA. Fig. 4 displays
and GL-SPARROW (29) and (30). For the grid-based methods the average CPU time for the various equivalent methods under
we use a grid of size K = 1000. investigation. To reduce the computational cost in the methods
Regarding the CPU time for the grid-based methods it can be based on the M Ă N raw measurement matrix Y we perform
seen that the SPARROW formulation (22) outperforms the 2,1 dimensionality reduction according to [49] to match a matrix
mixed-norm minimization (18) for N < 30 MMVs. For larger Y RD of dimensions M Ă M instead, as discussed in Section III.
number of MMVs the dimensions of the semidefinite constraint Using the dimensionality reduction technique, it can be seen
(22b) become too large such that the computational cost is in- that both grid-based SPARROW formulations (22) and (25)
creased as compared to 2,1 mixed-norm minimization (18). The have the same computational cost, since the dimensions of the
SPARROW formulation (25) is based on the sample covariance semidefinite constraints are identical. For M ⤠18 sensors the
matrix and thus the computational cost is independent of the grid-based SPARROW formulations outperform the 2,1 mixed-
number of MMVs. For the gridless methods, Fig. 3 clearly dis- norm minimization (18). However, for M > 18 the dimensions
plays that the CPU time of the GL-SPARROW formulation (29) of the semidefinite constraints in the SPARROW formulations
is significantly reduced as compared to the ANM formulation become too large such that the computational cost exceeds that
(17). Similar as for the grid-based case, the CPU time of the of the 2,1 mixed-norm minimization (18).
covariance-based GL-SPARROW formulation (30) is relatively Similar as for the grid-based SPARROW, the gridless SPAR-
independent of the number of MMVs N and outperforms the ROW formulations (29) and (30) show identical performance,
other methods for large number of MMVs N . Independent of the due to the identical size of the semidefinite constraints. Both
number of MMVs, the gridless SPARROW formulations (29) gridless SPARROW formulations clearly outperform the ANM
STEFFENS et al.: COMPACT FORMULATION FOR THE 2 , 1 MIXED-NORM MINIMIZATION PROBLEM 1493
Fig. 5. Convergence rate of the coordinate descent implementations of SPARROW and 2 , 1 mixed-norm minimization for varying number of source signals L,
frequency grid points K , sensors M and MMVs N , as well as varying SNR, and the resulting runtimes tS P and t 2 , 1 of the SPARROW and 2 , 1 CD methods.
approach (17), especially for large number of sensors M . This mixed-norm minimization for large number of sensors, when
can be explained by the additional M 2 complex variables in the the SDP formulations are used with the MOSEK solver. To deal
matrix Y 0 of the ANM formulation (17). with this problem we have presented a low complexity coordi-
With respect to the CD implementations it can be observed nate descent (CD) implementation in Section V which exploits
from Fig. 4 that both, the 2,1 and the SPARROW CD method, the special structure of our proposed SPARROW formulation.
show high computation time for low number of sensors, which For evaluation of the proposed SPARROW CD method and
can be explained by a high correlation of the atoms in the dic- for comparison to the CD method for 2,1 mixed-norm mini-
tionary matrix A. With increasing number of sensors the atoms mization (2,1 CD) [66], Fig. 5 displays the convergence rate of
become less correlated and the computation time reduces to a the two methods for various scenarios in terms of the objective
constant value for the considered number of sensors, where the function value f (Ď ) in iteration Ď as compared to the optimum
computation time of the SPARROW CD method is significantly value fË of the objective function2 . To allow comparison with
lower than that of the 2,1 CD method. the computation of the SDP formulations in the previous sec-
tions, the corresponding runtimes are provided below the plots,
where tSP and t 2 , 1 denote the CPU time of the SPARROW and
C. Coordinate Descent Method
As seen in the last section the computational cost of 2 For better comparison of the two CD methods the objective function of the
the grid-based SPARROW formulations exceeds that of 2,1 scaled SPARROW formulation in (63) is used here.
1494 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 66, NO. 6, MARCH 15, 2018
the 2,1 CD methods, respectively. For all scenarios the sensor SPARROW formulation provides significant savings in com-
positions Ďm , for m = 1, . . . , M , are selected uniformly at ran- putational cost as compared to 2,1 mixed-norm and atomic
dom in the interval [0, M ], while the spatial frequencies Îźl , for norm minimization, when applied in standard convex solvers
l = 1, . . . , L, are selected uniformly at random in the interval and coordinate descent methods.
[â1, 1), with a minimum spacing of minl,k |Îźl â Îźl | ⼠0.02,
for l, k = 1, . . . , L with l = k. APPENDIX A
Figs. 5(a)â(c)) illustrate how the convergence rates of the EQUIVALENCE OF SPARROW AND
two CD methods reduce with increasing number of grid points, 2,1 MIXED-NORM MINIMIZATION
which can be explained by the corresponding increase in the
Proof of Theorem 1: A key component in establishing the
correlation of the atoms in the dictionary matrix A. The conver-
equivalence in equations (18) and (19) is the observation that
gence behavior for varying number of MMVs is illustrated in
the 2 norm of a vector xk can be rewritten as
Figs. 5(d)â(f)), showing that the convergence rate for both CD
methods is essentially independent from the number of MMVs. 1
xk 2 = min (|Îłk |2 + g k 22 ) (55a)
However, comparing the runtimes of both CD methods it can be Îł k ,g k 2
observed that the 2,1 CD method requires higher computation s.t. Îłk g k = xk , (55b)
time due to the increased number of operations as compared to
the SPARROW CD method. In contrast to that, Figs. 5(g)â(i)) where Îłk is a complex scalar and g k is a complex vector of
show that the convergence rates of the CD methods slightly dimension N Ă 1, similar to xk . For the optimal solution of
decrease with increasing SNR, especially for the 2,1 method. (55), it holds that
This effect can be explained by the corresponding change in
xk 2 = |Îłk |2 = g k 22 . (56)
the regularization parameters according to (54), where a higher
SNR results in smaller regularization parameter Îť, which in To see this, consider that any feasible solution must fulfill
turn causes a reduced convergence rate. The effect of varying
1
number of sensors is displayed in Figs. 5(j)â(l), where it can xk 2 = |Îłk |2 g k 22 ⤠(|Îłk |2 + g k 22 ) (57)
be observed that the convergence rate improves with increasing 2
number of sensors. As discussed for Figs. 4 and 5(a)â(c)), this which constitutes the inequality of arithmetic and geometric
effect can be explained by the reduced correlation of the atoms means, with equality holding if and only if |Îłk | = g k 2 .
in the dictionary matrix A for larger number of sensors M at We can extend the idea in (55) to the 2,1 mixed-norm of
constant number of atoms K. Clearly, the results show that for the source signal matrix X = [x1 , . . . , xK ]T composed of rows
all scenarios SPARROW CD outperforms the 2,1 CD method, xk , for k = 1, . . . , K, by
in convergence rate as well as in runtime.
K
1
X2,1 = xk 2 = min (Î 2F + G2F ) (58a)
Î âD,G 2
VIII. CONCLUSION k =1
Upon substituting Y Y H = N RĚ and defining the nonnegative the GL-SPARROW constraint (66b) we can rewrite
â
diagonal matrix UĚ N / N â YH â
â Y N Toep(uĚ) + Îť N I M
S = Î Î H / N â D+ (62)
VĚ N YH â1 Z H Z ZH
we can rewrite (61) as the problem = 0 + Îť N â 0. (70)
Y 0 Toep(vĚ) Z Îť N IM
ÎťN
min Tr (ASAH + ÎťI M )â1 RĚ +Tr S . (63) From (70) it can be seen that any minimizers (VĚ N , vĚ, YĚ 0 ) of
SâD+ 2
the ANM problem (17), fulfilling constraint (17b), are feasible
Ignoring the factor ÎťN/2 in (63), we arrive at formulation (19). for the GL-SPARROW problem (66), since
From equation (56) and the definition of S = diag(s1 , . . . , sK ) â
H
1
â Z HZ ZH â1 ZH â1 ZH
in (62) we furthermore conclude that Îť N â =Îť N Îť N Îť N 0.
Z Îť N IM IM IM
1 (71)
sk = â xk 2 , (64)
N In the second step we prove that the optimal point (uĚ, UĚ N )
of (66) is feasible for the ANM problem (17). According to
for k = 1, . . . , K, as given by (21). Making further use of the
Corollary 1 we can assume w.l.o.g. that
factorization in (58b) we obtain â1
UĚ N = Y H Toep(uĚ) + ÎťI M Y (72)
XĚ = ÎĚ GĚ
â is optimal for (66). Using (69) and (72) in (68) and solving for
= ÎĚ ÎĚ H AH (AÎĚ ÎĚ H AH + Îť N I M )â1 Y VĚ N results in
1 â1
= SĚAH (ASĚAH + ÎťI M )â1 Y (65) VĚ N = â Y H Toep(uĚ) + ÎťI M Toep(uĚ)
N
which is (20). â1
Ă Toep(uĚ) + ÎťI M Y. (73)
where vecd(X) denotes the vector containing the elements on [17] G. Tang, B. N. Bhaskar, P. Shah, and B. Recht, âCompressed sensing
the main diagonal of matrix X. The Hessian matrix of (75) is off the grid,â IEEE Trans. Inf. Theory, vol. 59, no. 11, pp. 7465â7490,
Nov 2013.
computed as [18] B. N. Bhaskar, G. Tang, and B. Recht, âAtomic norm denoising with
applications to line spectral estimation,â IEEE Trans. Signal Process.,
â 2 f (s) vol. 61, no. 23, pp. 5987â5999, Dec. 2013.
= 2 Re (AH Qâ1 A)T (AH Qâ1 RQâ1 A) , (79) [19] G. Tang, B. N. Bhaskar, and B. Recht, âNear minimax line spectral esti-
âs âsT mation,â IEEE. Trans. Inf. Theory, vol. 61, no. 1, pp. 499â512, Jan. 2015.
with denoting the Hadamard product, i.e., elementwise mul- [20] M. Yuan and Y. Lin, âModel selection and estimation in regression with
grouped variables,â J. Roy. Statist. Soc. Series B, Statist. Methodol., vol. 68,
tiplication. From the Schur product theorem [75] it can be con- no. 1, pp. 49â67, 2006.
cluded that the Hessian matrix in (79) is positive semidefinite, [21] J. A. Tropp, âAlgorithms for simultaneous sparse approximation. Part II:
since for S = diag(s1 , . . . , sK ) 0 it holds that Q 0. In Convex relaxation,â Signal Process., vol. 86, no. 3, pp. 589â602, 2006.
[22] B. A. Turlach, W. N. Venables, and S. J. Wright, âSimultaneous variable
other words, the SPARROW formulation in (75), and (19), re- selection,â Technometrics, vol. 47, no. 3, pp. 349â363, 2005.
spectively, is convex for nonnegative diagonal matrices S. [23] M. Kowalski, âSparse regression using mixed norms,â Appl. Comput.
Considering only a single component sk of the objective (75), Harmon. Anal., vol. 27, no. 3, pp. 303â324, 2009.
[24] J. A. Tropp, A. C. Gilbert, and M. J. Strauss, âAlgorithms for simultaneous
we obtain the second order derivative sparse approximation. Part I: Greedy pursuit,â Signal Process., vol. 86,
no. 3, pp. 572â588, 2006.
â 2 f (s) â1 H â1 â1 [25] S. Cotter, B. Rao, K. Engan, and K. Kreutz-Delgado, âSparse solutions to
k Q ak ) ¡ (ak Q RQ ak )
= 2 (aH (80) linear inverse problems with multiple measurement vectors,â IEEE Trans.
âs2k
Signal Process., vol. 53, no. 7, pp. 2477â2488, Jul. 2005.
[26] Y. Jin and B. Rao, âSupport recovery of sparse signals in the presence of
which is strictly greater than zero, i.e., the objective function multiple measurement vectors,â IEEE Trans. Inf. Theory, vol. 59, no. 5,
of the SPARROW formulation is strictly convex in its single pp. 3139â3157, May 2013.
components. [27] J. Chen and X. Huo, âTheoretical results on sparse representations of
multiple-measurement vectors,â IEEE Trans. Signal Process., vol. 54,
no. 12, pp. 4634â4643, Dec. 2006.
REFERENCES [28] M.-J. Lai and Y. Liu, âThe null space property for sparse recovery from
multiple measurement vectors,â Appl. Comput. Harmon. Anal., vol. 30,
[1] C. Steffens, M. Pesavento, and M. Pfetsch, âA compact formulation for no. 3, pp. 402â406, 2011.
the L21 mixed-norm minimization problem,â in Proc. IEEE Int. Conf. [29] M. Davies and Y. Eldar, âRank awareness in joint sparse recovery,â IEEE
Acoust., Speech Signal Process., Mar. 2017, pp. 1â5. Trans. Inf. Theory, vol. 58, no. 2, pp. 1135â1146, Feb. 2012.
[2] R. Tibshirani, âRegression shrinkage and selection via the LASSO,â [30] Y. Li and Y. Chi, âOff-the-grid line spectrum denoising and estimation
J. Roy. Statist. Soc. Series B, Methodological, vol. 58, pp. 267â288, with multiple measurement vectors,â IEEE Trans. Signal Process., vol. 64,
1996. no. 5, pp. 1257â1269, Mar. 2016.
[3] S. S. Chen, D. L. Donoho, and M. A. Saunders, âAtomic decomposition [31] Z. Yang and L. Xie, âExact joint sparse frequency recovery via optimiza-
by basis pursuit,â SIAM J. Sci. Comput., vol. 20, pp. 33â61, 1998. tion methods,â IEEE Trans. Signal Process., vol. 64, no. 19, pp. 5145â
[4] D. Donoho, âCompressed sensing,â IEEE Trans. Inf. Theory, vol. 52, no. 4, 5157, Oct. 2016.
pp. 1289â1306, Apr. 2006. [32] Z. Yang and L. Xie, âOn gridless sparse methods for line spectral estima-
[5] E. CandeĚs, J. Romberg, and T. Tao, âRobust uncertainty principles: Ex- tion from complete and incomplete data,â IEEE Trans. Signal Process.,
act signal reconstruction from highly incomplete frequency information,â vol. 63, no. 12, p. 3139â3153, Jun. 2015.
IEEE Trans. Inf. Theory, vol. 52, no. 2, pp. 489â509, Feb. 2006. [33] H. Krim and M. Viberg, âTwo decades of array signal processing research:
[6] E. CandeĚs and T. Tao, âDecoding by linear programming,â IEEE Trans. The parametric approach,â IEEE Signal Process. Mag., vol. 13, no. 4,
Inf. Theory, vol. 51, no. 12, pp. 4203â4215, Dec. 2005. pp. 67â94, Jul. 1996.
[7] E. J. CandeĚs, J. K. Romberg, and T. Tao, âStable signal recovery from [34] H. L. van Trees, Optimum Array Processing: Part IV of Detection,
incomplete and inaccurate measurements,â Commun. Pure Appl. Math., Estimation, and Modulation Theory. New York, NY, USA: Wiley,
vol. 59, no. 8, pp. 1207â1223, Aug. 2006. 2002.
[8] E. J. CandeĚs and J. Romberg, âQuantitative robust uncertainty principles [35] R. Schmidt, âMultiple emitter location and signal parameter estimation,â
and optimally sparse decompositions,â Found. Comput. Math., vol. 6, IEEE Trans. Antennas Propag., vol. 34, no. 3, pp. 276â280, Mar. 1986.
no. 2, pp. 227â254, 2006. [36] P. Stoica and N. Arye, âMUSIC, maximum likelihood, and cramer-rao
[9] D. L. Donoho and M. Elad, âOptimally sparse representation in gen- bound,â IEEE Trans. Acoust., Speech, Signal Process., vol. 37, no. 5,
eral (nonorthogonal) dictionaries via 1 minimization,â Nat. Acad. Sci., pp. 720â741, May 1989.
vol. 100, no. 5., pp. 2197â2202, 2003. [37] D. Malioutov, M. Çetin, and A. Willsky, âA sparse signal reconstruction
[10] J. Tropp, J. Laska, M. Duarte, J. Romberg, and R. Baraniuk, âBeyond perspective for source localization with sensor arrays,â IEEE Trans. Signal
Nyquist: Efficient sampling of sparse bandlimited signals,â IEEE Trans. Process., vol. 53, no. 8, pp. 3010â3022, Aug. 2005.
Inf. Theory, vol. 56, no. 1, pp. 520â544, Jan. 2010. [38] M. M. Hyder and K. Mahata, âDirection-of-arrival estimation using a
[11] S. Mallat and Z. Zhang, âMatching pursuits with time-frequency dictio- mixed 2 , 0 norm approximation,â IEEE Trans. Signal Process., vol. 58,
naries,â IEEE Trans. Signal Process., vol. 41, no. 12, pp. 3397â3415, no. 9, pp. 4646â4655, Sep. 2010.
Dec. 1993. [39] J. Kim, O. K. Lee, and J. C. Ye, âCompressive MUSIC: A missing link
[12] J. Tropp and A. Gilbert, âSignal recovery from random measurements via between compressive sensing and array signal processing,â IEEE Trans.
orthogonal matching pursuit,â IEEE Trans. Inf. Theory, vol. 53, no. 12, Inf. Theory, vol. 58, no. 1, pp. 278â301, Jan. 2012.
pp. 4655â4666, Dec. 2007. [40] J. A. HoĚgbom, âAperture synthesis with a non-regular distribution of in-
[13] D. Needell and J. Tropp, âCoSaMP: Iterative signal recovery from in- terferometer baselines,â Astron. Astrophys. Suppl. Series, vol. 15, pp. 417â
complete and inaccurate samples,â Appl. Comput. Harmon. Anal., vol. 26, 426, Jun. 1974.
no. 3, pp. 301â321, 2009. [41] I. Gorodnitsky and B. Rao, âSparse signal reconstruction from limited
[14] E. J. CandeĚs and C. Fernandez-Granda, âSuper-resolution from noisy data using focuss: A re-weighted minimum norm algorithm,â IEEE Trans.
data,â J. Fourier Anal. Appl., vol. 19, no. 6, pp. 1229â1254, 2013. Signal Process., vol. 45, no. 3, pp. 600â616, Mar. 1997.
[15] E. J. CandeĚs and C. Fernandez-Granda, âTowards a mathematical theory of [42] L. Blanco and M. Najar, âSparse covariance fitting for direction of arrival
super-resolution,â Commun. Pure Appl. Math., vol. 67, no. 6, pp. 906â956, estimation,â EURASIP J. Adv. Signal Process., vol. 2012, no. 1, 2012,
2014. Art. no. 111.
[16] V. Chandrasekaran, B. Recht, P. A. Parrilo, and A. S. Willsky, âThe convex [43] J. Zheng and M. Kaveh, âSparse spatial spectral estimation: A covariance
geometry of linear inverse problems,â Found. Comput. Math., vol. 12, fitting algorithm, performance and regularization,â IEEE Trans. Signal
no. 6, pp. 805â849, 2012. Process., vol. 61, no. 11, pp. 2767â2777, Jun. 2013.
STEFFENS et al.: COMPACT FORMULATION FOR THE 2 , 1 MIXED-NORM MINIMIZATION PROBLEM 1497
[44] P. Stoica, P. Babu, and J. Li, âNew method of sparse parameter estimation [69] A. Belloni, V. Chernozhukov, and L. Wang, âSquare-root lasso: Pivotal
in separable models and its use for spectral analysis of irregularly sampled recovery of sparse signals via conic programming,â Biometrika, vol. 98,
data,â IEEE Trans. Signal Process., vol. 59, no. 1, pp. 35â47, Jan. 2011. no. 4, pp. 791â806, 2011.
[45] P. Stoica, P. Babu, and J. Li, âSPICE: A sparse covariance-based estimation [70] T. Park and G. Casella, âThe Bayesian Lasso,â J. Amer. Statist. Assoc.,
method for array processing,â IEEE Trans. Signal Process., vol. 59, no. 2, vol. 103, no. 482, pp. 681â686, 2008.
pp. 629â638, Feb. 2011. [71] P. Stoica and P. Babu, âSPICE and LIKES: Two hyperparameter-free
[46] P. Stoica, D. Zachariah, and J. Li, âWeighted SPICE: A unifying approach methods for sparse-parameter estimation,â Signal Process., vol. 92, no. 7,
for hyperparameter-free sparse estimation,â Digit. Signal Process., vol. 33, pp. 1580â1590, 2012.
pp. 1â12, 2014. [72] Z. Yang and L. Xie, âOn gridless sparse methods for multi-snapshot DOA
[47] C. Rojas, D. Katselis, and H. Hjalmarsson, âA note on the SPICE method,â estimation,â in Proc. IEEE Int. Conf. Acoust., Speech Signal Process.,
IEEE Trans. Signal Process., vol. 61, no. 18, pp. 4545â4551, Sep. 2013. 2016, pp. 3236â3240.
[48] P. Babu and P. Stoica, âConnection between SPICE and square-root [73] M. Grant and S. Boyd, âGraph implementations for nonsmooth convex
LASSO for sparse parameter estimation,â Signal Process., vol. 95, pp. 10â programs,â in Recent Advances in Learning and Control, ser. Lecture
14, 2014. Notes in Control and Information Sciences, V. Blondel, S. Boyd, and H.
[49] Z. Yang and L. Xie, âEnhancing sparsity and resolution via reweighted Kimura, Eds., New York, NY, USA: Springer-Verlag, 2008, pp. 95â110.
atomic norm minimization,â IEEE Trans. Signal Process., vol. 64, no. 4, [74] M. Grant and S. Boyd, âCVX: Matlab software for disciplined convex
pp. 995â1006, Feb. 2016. programming, version 2.1,â http://cvxr.com/cvx, Mar. 2014.
[50] Y. Chi, L. Scharf, A. Pezeshki, and A. Calderbank, âSensitivity to basis [75] J. Horn and C. Johnson, Matrix Analysis. Cambridge, U.K.: Cambridge
mismatch in compressed sensing,â IEEE Trans. Signal Process., vol. 59, Univ. Press, 1990.
no. 5, pp. 2182â2195, May 2011.
[51] M. A. Herman and T. Strohmer, âGeneral deviants: An analysis of per-
Christian Steffens received the Dipl.Ing. degree in
turbations in compressed sensing,â IEEE J. Sel. Topics Signal Process.,
electrical engineering from the University of Bremen,
vol. 4, no. 2, pp. 342â349, 2010.
Bremen, Germany, in 2010. From 2010 to 2016, he
[52] Z. Yang, J. Li, P. Stoica, and L. Xie, âSparse methods for direction-
has held a research position with the Communication
of-arrival estimation,â in Academic Press Library in Signal Processing-
Systems Group, Technical University of Darmstadt,
Array, Radar and Communications Engineering, 1st ed., S. Theodoridis
Darmstadt, Germany, where his research focused on
and R. Chellappa, Eds. New York, NY, USA: Academic, Oct. 2017, vol. 7,
sparse signal reconstruction, parameter estimation,
ch. 11.
array processing, and sensor networks. Since 2017,
[53] C. CaratheĚodory, âUĚber den VariabilitaĚtsbereich der Fourierschen Kon-
he has been working with Telespazio VEGA, Darm-
stanten von positiven harmonischen Funktionen,â Rendiconti del Circolo
stadt, Germany. He was the recipient of a Student
Matematico di Palermo (1884â1940), vol. 32, no. 1, pp. 193â217, 1911.
Best Paper Award at the IEEE Sensor Array and Mul-
[54] C. CaratheĚodory and L. FejeĚr, âUĚber den Zusammenhang der extremen von
tichannel Signal Processing Workshop (SAM) 2014.
harmonischen Funktionen mit ihren Koeffizienten und uĚber den Picard-
Landauschen Satz,â Rendiconti del Circolo Matematico di Palermo (1884â
1940), vol. 32, no. 1, pp. 218â239, 1911. Marius Pesavento received the Dipl.Ing. and M.Eng.
[55] O. Toeplitz, âZur Theorie der quadratischen und bilinearen Formen von degrees from Ruhr-University Bochum, Bochum,
unendlich vielen VeraĚnderlichen,â Mathematische Annalen, vol. 70, no. 3, Germany, and McMaster University, Hamilton, ON,
pp. 351â376, 1911. Canada, in 1999 and 2000, respectively, and the
[56] G. de Prony, âEssai expeĚrimental et analytique: sur les lois de la dilatabiliteĚ Dr. Ing. degree in electrical engineering from Ruhr-
des fluides eĚlastiques et sur celles de la force expansive de la vapeur de University Bochum, in 2005. Between 2005 and
lâeau et de la vapeur de lâalcool aĚ diffeĚrentes tempeĚratures,â J. de lâEĚcole 2009, he held research positions in two start-up com-
Polytechnique, vol. 1, no. 22, pp. 24â76, 1795. panies in the ICT area. In 2010, he became an Assis-
[57] Y. Hua and T. K. Sarkar, âMatrix pencil method for estimating parame- tant Professor of robust signal processing and a Full
ters of exponentially damped/undamped sinusoids in noise,â IEEE Trans. Professor of communication systems, in 2013, with
Acoust., Speech, Signal Process., vol. 38, no. 5, pp. 814â824, May 1990. the Department of Electrical Engineering and Infor-
[58] D. W. Tufts and R. Kumaresan, âEstimation of frequencies of multiple mation Technology, Technical University Darmstadt, Darmstadt, Germany. His
sinusoids: Making linear prediction perform like maximum likelihood,â research interests include robust signal processing and adaptive beamforming,
Proc. IEEE, vol. 70, no. 9, pp. 975â989, Sep. 1982. high-resolution sensor array processing, multiantenna and multiuser communi-
[59] Z. Yang and L. Xie, âContinuous compressed sensing with a single or cation systems, distributed, sparse, and mixed-integer optimization techniques
multiple measurement vectors,â in Proc. IEEE Workshop Statist. Signal for signal processing, communications and machine learning, statistical signal
Process., Jun. 2014, pp. 288â291. processing, spectral analysis, and parameter estimation. He was the recipient
[60] G. Tang, B. Bhaskar, and B. Recht, âSparse recovery over continu- of the 2003 ITG/VDE Best Paper Award, the 2005 Young Author Best Paper
ous dictionaries-just discretize,â in Proc. Asilomar Conf. Signals, Syst. Award of the IEEE TRANSACTIONS ON SIGNAL PROCESSING, and the 2010 Best
Comput., Nov. 2013, pp. 1043â1047. Paper Award of the CrownCOM conference. He is a Member of the Editorial
[61] MOSEK ApS, The MOSEK optimization toolbox for MATLAB manual. Board for the EURASIP Signal Processing Journal, and served as an Associate
Version 7.1 (Revision 28), 2015. [Online]. Available: http://docs.mosek. Editor for the IEEE TRANSACTIONS ON SIGNAL PROCESSING in 2012â2016. He
com/7.1/toolbox/index.html is a Member of the Sensor Array and Multichannel Technical Committee of the
[62] L. Vandenberghe and S. Boyd, âSemidefinite programming,â SIAM Rev., IEEE Signal Processing Society, and the Special Area Teams âSignal Processing
vol. 38, no. 1, pp. 49â95, 1996. for Communications and Networkingâ and âSignal Processing for Multisensor
[63] W. Suleiman, C. Steffens, A. Sorg, and M. Pesavento, âGridless com- Systemsâ of the EURASIP.
pressed sensing for fully augmentable arrays,â in Proc. 25th Eur. Signal
Process. Conf., Kos Island, Greece, Sep. 2017, pp. 1986â1990.
[64] C. Steffens, W. Suleiman, A. Sorg, and M. Pesavento, âGridless com- Marc E. Pfetsch received the Diploma degree in
pressed sensing under shift-invariant sampling,â in Proc. IEEE Int. Conf. mathematics from the University of Heidelberg, Hei-
Acoust., Speech Signal Process., Mar. 2017, pp. 4735â4739. delberg, Germany, in 1997, the Ph.D. degree in math-
[65] C. Steffens and M. Pesavento, âBlock- and rank-sparse recovery for di- ematics, in 2002, and the Habilitation degree, in 2008,
rection finding in partly calibrated arrays,â IEEE Trans. Signal Process., from Technische UniversitaĚt (TU) Berlin, Berlin,
vol. 66, no. 2, pp. 384â399, Jan. 2018. Germany. From 2008 to 2012, he was a Full Pro-
[66] Z. Qin, K. Scheinberg, and D. Goldfarb, âEfficient block-coordinate de- fessor of mathematical optimization with TU Braun-
scent algorithms for the group LASSO,â Math. Program. Comput., vol. 5, schweig, Braunschweig, Germany. Since April 2012,
no. 2, pp. 143â169, 2013. he has been a Full Professor of discrete optimiza-
[67] S. J. Wright, âCoordinate descent algorithms,â Math. Program., vol. 151, tion with TU Darmstadt, Darmstadt, Germany. His
no. 1, pp. 3â34, 2015. research interests include discrete optimization, in
[68] D. Bertsekas, Nonlinear Programming, 3rd ed. Belmont, MA, USA: particular symmetry in integer programs, compressed sensing, and algorithms
Athena Scientific, 2016. for mixed integer programs.