0% found this document useful (0 votes)
11 views15 pages

Steffens 2018

This paper presents a compact reformulation of the 𝓁2,1 mixed-norm minimization problem for jointly sparse signal reconstruction from multiple measurement vectors (MMVs), significantly reducing computational costs. The proposed method, named SPARROW, allows for closed-form reconstruction of the jointly sparse signal matrix and offers a gridless implementation for parameter estimation in array signal processing. The authors also demonstrate the equivalence of their formulation to atomic norm minimization and provide a low complexity implementation suitable for large and irregular sampling scenarios.

Uploaded by

mnaghibian80
Copyright
Š Š All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views15 pages

Steffens 2018

This paper presents a compact reformulation of the 𝓁2,1 mixed-norm minimization problem for jointly sparse signal reconstruction from multiple measurement vectors (MMVs), significantly reducing computational costs. The proposed method, named SPARROW, allows for closed-form reconstruction of the jointly sparse signal matrix and offers a gridless implementation for parameter estimation in array signal processing. The authors also demonstrate the equivalence of their formulation to atomic norm minimization and provide a low complexity implementation suitable for large and irregular sampling scenarios.

Uploaded by

mnaghibian80
Copyright
Š Š All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 66, NO.

6, MARCH 15, 2018 1483

A Compact Formulation for the


2,1 Mixed-Norm Minimization Problem
Christian Steffens , Marius Pesavento , and Marc E. Pfetsch

Abstract—Parameter estimation from multiple measurement structure of the signal admits a unique solution to the underde-
vectors (MMVs) is a fundamental problem in many signal pro- termined system. In the signal processing context, this implies
cessing applications, e.g., spectral analysis and direction-of-arrival that far fewer samples than postulated by the Shannon-Nyquist
estimation. Recently, this problem has been addressed using prior
information in form of a jointly sparse signal structure. A promi- sampling theorem for bandlimited signals are required for per-
nent approach for exploiting joint sparsity considers mixed-norm fect signal reconstruction [10]. While SSR under the classical 0
minimization in which, however, the problem size grows with the formulation constitutes a combinatorial and NP-complete opti-
number of measurements and the desired resolution, respectively. mization problem, several methods exist to approximately solve
In this work, we derive an equivalent, compact reformulation of the the SSR problem. Most prominent methods are based on convex
2 ,1 mixed-norm minimization problem that provides new insights
on the relation between different existing approaches for jointly relaxation in terms of 1 norm minimization, which makes the
sparse signal reconstruction. The reformulation builds upon a com- SSR problem computationally tractable while providing suffi-
pact parameterization, which models the row-norms of the sparse cient conditions for exact recovery [2]–[9], or greedy methods,
signal representation as parameters of interest, resulting in a sig- such as OMP [11], [12] and CoSaMP [13], which have low
nificant reduction of the MMV problem size. Given the sparse vec- computational cost but provide reduced recovery guarantees.
tor of row-norms, the jointly sparse signal can be computed from
the MMVs in closed form. For the special case of uniform linear In the context of parameter estimation, e.g., in Direction-Of-
sampling, we present an extension of the compact formulation for Arrival (DOA) estimation, the SSR problem has been extended
gridless parameter estimation by means of semidefinite program- to an infinite-dimensional vector space by means of total varia-
ming. Furthermore, we prove in this case the exact equivalence tion norm and atomic norm minimization [14]–[19], leading to
between our compact problem formulation and the atomic-norm gridless parameter estimation methods.
minimization. Additionally, for the case of irregular sampling or a
large number of samples, we present a low complexity, grid-based Besides the aforementioned SMV problem, many practi-
implementation based on the coordinate descent method. cal applications deal with the problem of finding a jointly
sparse signal representation from Multiple Measurement Vec-
Index Terms—Multiple measurement vectors, joint sparsity,
mixed-norm minimization, gridless estimation.
tors (MMVs), also referred to as the multiple snapshot estima-
tion problem. Similar to the SMV case, approximate methods
I. INTRODUCTION for the MMV-based SSR problem include convex relaxation
by means of mixed-norm minimization [20]–[23], and greedy
PARSE Signal Reconstruction (SSR) techniques have
S gained a considerable research interest over the last decades
[2]–[9]. Traditionally, SSR considers the problem of recon-
methods [24], [25]. Recovery guarantees for the MMV case
have been established in [26]–[29]. An extension to the infinite-
dimensional vector space for MMV-based SSR, using atomic
structing a high-dimensional sparse signal vector from a low-
norm minimization, has been proposed in [30]–[32].
dimensional Single Measurement Vector (SMV), which is char-
Apart from SSR, MMV-based parameter estimation is a clas-
acterized by an underdetermined system of linear equations. It
sical problem in array signal processing [33], [34]. Prominent
has been shown that exploiting prior knowledge on the sparsity
applications in array processing include beamforming and DOA
estimation. Beamforming considers the problem of signal recon-
Manuscript received May 8, 2017; revised November 21, 2017; accepted struction in the presence of noise and interference while DOA
December 11, 2017. Date of publication January 1, 2018; date of current ver- estimation falls within the concept of parameter estimation and
sion February 1, 2018. The associate editor coordinating the review of this
manuscript and approving it for publication was Dr. Gonzalo Mateos. This is addressed, e.g., by the subspace-based MUSIC method [35].
work was supported by the EXPRESS project within the DFG priority program The MUSIC method has been shown to perform asymptoti-
CoSIP (DFG-SPP 1798). This paper was presented in part at the 42nd IEEE cally optimal [36] and offers the super-resolution property at
International Conference on Acoustics, Speech and Signal Processing, New
Orleans, LA, USA, March 2017. (Corresponding author: Christian Steffens.) tractable computational cost. On the other hand, in the non-
C. Steffens and M. Pesavento are with the Communication Systems asymptotic case of low number of MMVs or correlated source
Group, Technische Universität Darmstadt, Darmstadt 64283, Germany (e-mail: signals, the performance of subspace-based estimation methods
[email protected]; [email protected]).
M. E. Pfetsch is with the Discrete Optimization Group, Technische Uni- can drastically deteriorate such that SSR techniques provide an
versität Darmstadt, Darmstadt 64293, Germany (e-mail: pfetsch@mathematik. attractive alternative for these scenarios [37]–[39]. In fact, due
tu-darmstadt.de). to similar objectives in SSR and array signal processing, strong
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. links between the two fields of research have been established
Digital Object Identifier 10.1109/TSP.2017.2788431 in literature. The OMP has an array processing equivalent in

1053-587X Š 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications standards/publications/rights/index.html for more information.
1484 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 66, NO. 6, MARCH 15, 2018

the CLEAN method [40] for source localization in radio as- of our proposed reformulation as compared to both equivalent
tronomy, i.e., both methods rely on the same greedy estima- formulations, the classical 2,1 mixed-norm [20], [37] and the
tion approach. In [25], [41] the authors present the FOCUSS atomic norm [30]–[32] problem formulations.
method, which provides sparse estimates by iterative weighted In summary, our main contributions are the following:
norm minimization, with application to DOA estimation. SSR r We derive an equivalent, compact reformulation of the
based on an 2,0 mixed-norm approximation has been consid- classical 2,1 mixed-norm minimization problem [20],
ered in [38], while a convex relaxation approach based on the [37], named SPARROW, with significantly reduced com-
2,1 mixed-norm has been proposed in [37]. DOA estimation putational cost.
based on second-order signal statistics has been addressed in r We prove that a gridless implementation of the SPAR-
[42], [43], where a sparse covariance matrix representation is ROW formulation is equivalent to the atomic norm min-
exploited by application of a sparsity prior on the source co- imization problem [30]–[32], while having significantly
variance matrix, leading to an SMV-like sparse minimization reduced computational cost.
problem. In [44]–[46] the authors propose the SPICE method, r We provide a low complexity implementation of the com-
which is based on weighted covariance matching and constitutes pact SPARROW formulation, based on the coordinate de-
a sparse estimation problem which does not require the assump- scent method, for application in large and irregular sam-
tion of a sparsity prior. Links between SPICE and SSR formula- pling scenarios, which shows improved convergence as
tions have been established in [32], [45]–[48], which show that compared to the non-compact case.
SPICE can be reformulated as an 2,1 mixed-norm minimization r We extend the available results on theoretical links between
problem. the 2,1 mixed-norm minimization problem and the SPICE
In this paper we consider jointly sparse signal reconstruction method [44]–[46].
from MMVs by means of the classical 2,1 mixed-norm mini- The paper is organized as follows: In Section II we present
mization problem, with application to DOA estimation in array the sensor array signal model. A short review of the classical
signal processing. Compared to recently presented sparse meth- 2,1 mixed-norm minimization problem and the atomic norm
ods such as SPICE [44]–[46] and atomic norm minimization minimization problem is provided in Section III, before the
[30]–[32], the classical 2,1 formulation has the general short- equivalent, compact SPARROW formulation is introduced in
coming that its problem size grows with the number of measure- Section IV. A low complexity implementation of the SPAR-
ments and the resolution requirement, respectively. Approaches ROW formulation is derived in Section V. Section VI provides
to deal with the aforementioned problems have been presented, a theoretical comparison of the SPARROW formulation and
e.g., in [37], [49]. While the classical 2,1 mixed-norm mini- the SPICE method. Simulation results for comparison of the
mization problem has a large number of variables in the jointly computational cost of the various formulations are presented in
sparse signal representation, in this paper we derive an equiv- Section VII. Conclusions are provided in Section VIII.
alent problem reformulation based on a compact parameteriza- Notation: Boldface uppercase letters X denote matrices,
tion in which the optimization parameters represent the row- boldface lowercase letters x denote column vectors, and reg-
norms of the signal representation, rather than the signal matrix ular letters x, N denote scalars, with j denoting the imaginary
itself. We refer to this formulation as SPARse ROW-norm recon- unit. Superscripts X T and X H denote transpose and conjugate
struction (SPARROW). Given the sparse signal row-norms, the transpose of a matrix X, respectively. The sets of diagonal and
jointly sparse signal matrix is reconstructed from the MMVs in nonnegative diagonal matrices are denoted as D and D+ , re-
closed-form. We point out that support recovery is determined spectively. We write [X]m ,n to indicate the element in the mth
by the sparse vector of row-norms and only relies on the sample row and nth column of matrix X. The statistical expectation
covariance matrix instead of the MMVs themselves. In this sense of a random variable x is denoted as E{x}, and the trace of a
we achieve a concentration of the optimization variables as well matrix X is referred to as Tr(X). The Frobenius norm and the
as the measurements, leading to a significantly reduced prob- p,q mixed-norm of a matrix X are referred to as X F and
lem size in the case of a large number of MMVs. Using standard Xp,q , respectively, while the p norm of a vector x is denoted
concepts of semidefinite programming, we derive a gridless im- as xp . Toep(u) describes a Hermitian Toeplitz matrix with u
plementation of our SPARROW formulation for application in as its first column and diag(x) denotes a diagonal matrix with
uniform sampling scenarios and prove its equivalence to atomic the elements in x on its main diagonal.
norm minimization. Furthermore, we present a low complex-
ity implementation of our grid-based SPARROW formulation
II. SIGNAL MODEL
based on the coordinate descent method which is applicable to
large and irregular sampling scenarios. To put our new problem Consider a linear array of M omnidirectional sensors, as
formulation in context with other existing methods, we com- depicted in Fig. 1. Further, assume a set of L narrowband far-
pare it to the SPICE method and our results extend the existing field sources in angular directions θ1 , . . . , θL, summarized as
links between SPICE and 2,1 mixed-norm minimization. We θ = [θ1 , . . . , θL ]T . The corresponding spatial frequencies are
conclude our presentation by a short numerical analysis of the defined as
computational cost of our proposed SPARROW formulation
which shows a significant reduction in the computational time μl = cos θl ∈ [−1, 1), (1)
STEFFENS et al.: COMPACT FORMULATION FOR THE  2 , 1 MIXED-NORM MINIMIZATION PROBLEM 1485

Fig. 1. Exemplary setup for a linear array of M = 6 sensors and L = 3 source Fig. 2. Signal model and sparse representation (neglecting additive noise and
signals. basis mismatch) for M = 6 sensors, L = 3 source signals and K = 12 grid
points.

for l = 1, . . . , L, and comprised in the vector Îź = [Îź1 , . . . ,


the remainder of the paper an refer to the dictionary matrix as
ÎźL ]T . The array output provides measurement vectors, also re-
A = A(ν). We assume that the frequency grid is sufficiently
ferred to as snapshots, which are recorded over N time instants,
fine, such that the true frequencies in Îź are contained in the
where we assume that the sources transmit time-varying signals
frequency grid ν, i.e.,
while the spatial frequencies in Îź remain constant within the en-
tire observation time. The measurement vectors are collected in {μl }Ll=1 ⊂ {νk }K
k =1 . (6)
the multiple measurement vector (MMV) matrix Y ∈ C M ×N ,
where [Y ]m ,n denotes the output of sensor m in time instant n. Since the true frequencies in Îź are not known in advance and
The MMV matrix is modeled as the grid-size is limited in practice, the on-grid assumption (6) is
usually not fulfilled, leading to spectral leakage effects and ba-
Y = A(Ο)Ψ + N , (2) sis mismatch [50], [51] in the reconstructed signal. The atomic
norm approach presented in Section III-B and our proposed
where Ψ ∈ C L ×N is the source signal matrix, with [Ψ ]l,n denot-
gridless method in Section IV do not rely on the on-grid as-
ing the signal transmitted by source l in time instant n, and N ∈
sumption. However, elsewhere we assume (6) to hold true for
C M ×N represents circular and spatio-temporal white Gaussian
ease of presentation.
sensor noise with covariance matrix E{N N H }/N = σ 2 I M ,
The K × N sparse signal matrix X in (5) contains elements
where I M and σ 2 denote the M × M identity matrix and the 
noise power, respectively. The M × L array steering matrix [Ψ ]l,n if νk = μl
A(Îź) in (2) is given by [X]k ,n = (7)
0 else,
A(Îź) = [a(Îź1 ), . . . , a(ÎźL )], (3)
for k = 1, . . . , K, l = 1, . . . , L and n = 1, . . . , N . Thus X ex-
where hibits a row-sparse structure, i.e., the elements in a row of X
are either jointly zero or primarily non-zero, as illustrated in
a(μ) = [1, e− jπ μρ 2 , . . . , e− jπ μρ M ]T (4) Fig. 2. To exploit the joint sparsity assumption in the estima-
is the array manifold vector with ρm ∈ R, for m = 1, . . . , M , tion problem, it was proposed, e.g., in [20]–[23], [37], [38], to
denoting the position of the mth sensor in half signal wave- utilize a mixed-norm formulation leading to the classical p,q
length, relative to the first sensor in the array, hence ρ1 = 0. mixed-norm minimization problem
1 √
III. JOINT SPARSE RECONSTRUCTION FROM min AX − Y 2F + λ N Xp,q . (8)
X 2
MULTIPLE MEASUREMENT VECTORS
In (8), the data fitting AX − Y 2F is performed by means
Two prominent approaches for joint sparse reconstruction of the Frobenius norm to optimally match the reconstructed
from multiple measurement vectors are grid-based 2,1 mixed- measurements AX in the presence of additive white Gaussian
norm minimization [20], [37] and gridless atomic norm mini- noise. The regularization parameter Îť > 0 admits balancing the
mization [16]–[19], [30]–[32]. Both approaches result in convex data fitting fidelity versus the sparsity level in X, where the
optimization problems which can be solved in polynomial time. choice of a small Îť in (8) tends to result in a large number of
The two methods will shortly be reviewed in this section. non-zero rows, whereas a large value of Îť tends to result in a
small number of non-zero rows. Joint sparsity on the rows xk ,
A. 2,1 Mixed-Norm Minimization for k = 1, . . . , K, of the signal matrix X = [x1 , . . . , xK ]T is
We define a sparse representation of the model in (2) as induced by the p,q mixed-norm, which is defined as

Y = A(ν)X + N , (5) Xp,q = x( p ) q , (9)

with X denoting a K × N row-sparse signal matrix, and the where


M × K overcomplete dictionary matrix A(ν) is defined in cor-  T
x( p ) = x1 p , . . . , xK p , (10)
respondence to (3), where the vector ν = [ν1 , . . . , νK ]T is ob-
tained by sampling the spatial frequencies in K  L points i.e., an inner p norm is applied on the rows of X to generate the
ν1 , . . . , νK. For ease of notation we will drop the argument in vector of p row-norms x( p ) , and an outer q norm is applied on
1486 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 66, NO. 6, MARCH 15, 2018

the resulting vector x( p ) . The inner p norm provides a nonlin- B. Atomic Norm Minimization
ear coupling among the elements in a row, leading to the desired The concept of Atomic Norm Minimization (ANM) has been
row-sparse structure of the signal matrix X. Ideally, considering introduced in [16] as a unifying framework for different types
the representation in (5) with the row sparse structure in (7), we of sparse recovery methods, such as 1 norm minimization
desire a problem formulation containing an p,0 pseudo-norm, for sparse vector reconstruction or nuclear norm minimization
leading, however, to an NP-complete problem, such that convex for low-rank matrix completion. In [17]–[19] ANM was intro-
relaxation in form of p,1 mixed-norm is considered in prac- duced for gridless line spectral estimation from SMVs in uni-
tice to obtain computationally tractable problems. In the SMV form linear arrays (ULAs). The extension of ANM to MMVs
case, i.e., N = 1, the p,1 mixed-norm reduces to the 1 norm, under this setup was studied in [30]–[32], which will be re-
such that p,1 mixed-norm minimization can be considered as vised in the following. Consider L source signals with spatial
a generalization of the classical 1 norm minimization problem frequencies Îź1 , . . . , ÎźL, impinging on a ULA with sensor po-
[2], [3] to the MMV case with N > 1. Common choices of sitions ρm = m − 1, for m = 1, . . . , M . The noise-free mea-
mixed-norms are the 2,1 norm [20], [37] and the ∞,1 norm surement
[21], [22]. Similar to the SMV case, recovery guarantees for  matrix obtained at the array output is modeled as
Y 0 = Ll=1 a(Οl )ψ Tl , where the samples of the lth source sig-
the MMV-based joint SSR problem have been derived [26]– nal are contained in the N × 1 vector ψ l . In the ANM frame-
[28], providing conditions for the noiseless case under which work [30]–[32], the measurement matrix Y 0 is considered as a
the sparse signal matrix X can be perfectly reconstructed. convex combination of atoms a(ν)bH with b ∈ C N , b2 = 1
Given a row-sparse minimizer X̂ for (8), the DOA estimation and ν ∈ [−1, 1), i.e., in contrast to the previous section the fre-
problem reduces to identifying the union support set, i.e., the quencies ν are continuous and not restricted to lie on a grid. The
indices of the non-zero rows, from which the set of estimated atomic norm of Y 0 is defined as
spatial frequencies can be obtained as  
 
Y 0 A = inf ck : Y 0 = ck a(νk )bk , ck ≥ 0 .
H
{c k ,bk ,
{μ̂l }L̂l=1 = {νk | x̂k p > 0, k = 1, . . . , K } (11) νk } k k
(12)
For the special case of ULAs, it was shown in [16]–[19], [30]–
where x̂k corresponds to the kth row of the estimated signal [32] that the atomic norm in (12) can equivalently be computed
matrix X̂ = [x̂1 , . . . , x̂K ]T and L̂ denotes the number of non- by the semidefinite program (SDP)
zero rows in X̂, i.e., the estimated model order.
1  1 
One major drawback of the mixed-norm minimization prob- Y 0 A = inf Tr V N + Tr Toep(v) (13a)
v,V N 2 2M
lem in (8) lies in its computational cost, which is determined
by the size of the K × N source signal matrix X. A large VN YH
0
number of grid points K is desired to improve the frequency s.t. 0. (13b)
Y0 Toep(v)
resolution, while a large number of measurement vectors N is
desired to improve the estimation performance. However, the Given a solution to problem (13) the reconstruction of the spa-
choice of too large values K and N makes the problem compu- tial frequencies νk and magnitudes ck , for k = 1, . . . , K, is
tationally intractable. To reduce the computational cost in the performed by means of the Vandermonde decomposition: For
MMV problem it was suggested in [37] to reduce the dimen- ULAs the M × K matrix A = [a(ν1 ), . . . , a(νK )] has a Van-
sion of the M × N measurement matrix Y by matching only dermonde structure such that the product a(νk )aH (νk ) exhibits
the signal subspace in form of an M × L matrix Y SV , leading a Toeplitz structure and
to the prominent 1 -SVD method. A drawback of the 1 -SVD

K
method is that it requires knowledge of the number of source ck a(νk )aH (νk ) = Toep(v), (14)
signals and that the estimation performance may deteriorate in k =1
the case of correlated source signals. In [49], [52] a related di-
where Toep(v) denotes a Hermitian Toeplitz matrix with v as
mensionality reduction approach was proposed. Instead of only
its first column. As discussed in [17], the Caratheodory the-
matching the signal subspace, the authors propose to match the
orem [53]–[55] states that any Toeplitz matrix Toep(v) of
signal and noise subspace in form of an M × M matrix Y RD .
rank K ≤ M can be represented by a Vandermonde decom-
It was shown in [52] that matching the matrix Y RD results in
position according to (14) for any K ≤ M distinct frequencies
the same estimate of the sparse spatial spectrum as matching
ν1 , . . . , νK and corresponding magnitudes c1 , . . . , cK > 0. In
the original measurement matrix Y . In case of a large number
practice, the Vandermonde decomposition for a Toeplitz matrix
of measurement vectors N > M , both dimensionality reduc-
Toep(v) according to (14) can be obtained by first recovering
tion approaches result in reduced computational cost since the
the frequencies ν̂k , e.g., by Prony’s method [56], the matrix
dimension of the signal matrix X is equally reduced.
pencil approach [57] or linear prediction methods [58], where
To achieve high frequency resolution it was further suggested
the frequency recovery is performed in a gridless fashion. The
in [37] to perform an adaptive grid refinement. In the special
corresponding signal magnitudes in c = [c1 , . . . , cK ]T can be
case of uniform linear arrays the 2,1 mixed-norm minimization
reconstructed by solving the linear system
problem can equivalently by addressed in a gridless fashion by
the atomic norm framework, discussed in the following section. A(ν̂) c = v, (15)
STEFFENS et al.: COMPACT FORMULATION FOR THE  2 , 1 MIXED-NORM MINIMIZATION PROBLEM 1487

i.e., by exploiting that [a(ν)]1 = 1, for all ν ∈ [−1, 1), and In addition to (20), we observe that the matrix Ŝ = diag(ŝ1 ,
considering the first column in the representation (14). . . . , ŝK ) contains the row-norms of the sparse signal matrix
As proposed in [30]–[32], given a noise-corrupted measure- X̂ = [x̂1 , . . . , x̂K ]T on its diagonal according to
ment matrix Y as defined in (2), gridless joint sparse recovery
1
from MMVs can be performed by using (12) in the form of ŝk = √ x̂k 2 , (21)
N
1 √
min Y − Y 0 2F + λ N Y 0 A (16) for k = 1, . . . , K, such that the union support of X̂ is equiva-
Y0 2
lently represented by the support of the sparse vector of row-
or, equivalently, by using the SDP formulation in (13), as norms [ŝ1 , . . . , ŝK ]. We will refer to (19) as SPARse ROW-norm
√ reconstruction (SPARROW). In this regard, we emphasize that
1 Îť N  1 
min Y − Y 0  F + 2
Tr V N + Tr Toep(v) Ŝ should not be mistaken for a sparse representation of the
v,V N , 2 2 M
Y0 source covariance matrix, i.e., Ŝ = E{X̂ X̂ H }/N . While the
(17a) mixed-norm minimization problem in (18) has N K complex
variables in X, the SPARROW problem in (19) provides a re-
VN YH0
s.t. 0. (17b) duction to only K nonnegative variables in the diagonal matrix
Y0 Toep(v) S. However, the union support of X̂ is similarly provided by
Similar as for the 2,1 mixed-norm minimization problem, the Ŝ. Moreover, the SPARROW problem in (19) only relies on
ANM problem suffers from a large number of optimization the sample covariance matrix R̂ instead of the MMVs in Y
parameters in the matrix Y 0 in the case of a large number of themselves, leading to a reduction in problem size, especially
MMVs N such that dimensionality reduction techniques similar in the case of large number of MMVs N . Interestingly, this
to those discussed in Section III-A have been proposed to reduce also indicates that the union support of the signal matrix X̂ is
the computational cost [49]. Additionally, the dimensions of the fully encoded in the sample covariance R̂, rather than the in-
semidefinite constraint (17b) grow with the number of sensors stantaneous MMVs in Y , as may be concluded from the 2,1
M and MMVs N and the problem becomes intractable for large formulation in (18). Similar observations were made in [52] in
values of M and N . An implementation of the SDP based on the the context of dimensionality reduction. As seen from (20), the
alternating direction method of multipliers (ADMM) has been instantaneous MMVs in Y are only required for the signal re-
proposed in [18], [59] to reduce the problem of computational construction, which, in the context of array signal processing,
cost. However, for large problem sizes it was proposed in [60] can be interpreted as a form of beamforming [34], where the
to rather use the grid-based formulations such as the 2,1 mixed- row-sparse structure in X̂ is induced by premultiplication with
norm minimization (8) problem which can be solved efficiently, the sparse diagonal matrix Ŝ. In contrast to the dimensional-
rather than the SDP formulation in (17). ity reduction techniques discussed in Section III-A, the pro-
posed SPARROW formulation in (19) admits a reduced number
of variables while providing the same solution as the original
IV. SPARROW: A REFORMULATION OF THE
2,1 mixed-norm minimization problem in (18). In comparison,
2,1 MIXED-NORM MINIMIZATION PROBLEM
the 1 -SVD method in [37] requires a K × L matrix variable
As discussed in Sections I and III, the MMV-based 2,1 mixed- X SV and thus has significantly reduced number of parameters
norm minimization problem is a well investigated problem with in case of small number of sources L, but suffers from de-
many fields of application. In this context, one of the main results graded estimation performance in case of incorrect subspace
of this manuscript is given by the following, novel problem estimation. Conversely, the dimensionality reduction technique
reformulation: in [49], [52] provides the same estimation performance as the
Theorem 1: The row-sparsity inducing 2,1 mixed-norm original 2,1 mixed-norm minimization problem in (18), but
minimization problem requires a K × M matrix variable X RD , i.e., it suffers from
1 √ increased number of parameters for large number of sensors M ,
min AX − Y 2F + λ N X2,1 (18) as compared to the SPARROW and 1 -SVD methods.
X 2
To show convexity of the SPARROW formulation (19) and for
is equivalent to the convex problem implementation with standard convex solvers, such as MOSEK
 [61], consider the following corollaries [62]:
min Tr (ASAH + λI M )−1 R̂ + Tr(S), (19)
S∈D+ Corollary 1: The SPARROW problem in (19) is equivalent
to the semidefinite program (SDP)
with R̂ = Y Y H /N denoting the sample covariance matrix and
D+ describing the set of nonnegative diagonal matrices, in the 1
min Tr(U N ) + Tr(S) (22a)
sense that minimizers X̂ and Ŝ for problems (18) and (19), S,U N N
respectively, are related by
UN YH
X̂ = ŜA (AŜA + λI M ) Y .
H H −1
(20) s.t. 0 (22b)
Y ASAH + ÎťI M
A proof of the equivalence is provided in Appendix A, while S ∈ D+ (22c)
a proof of the convexity of (19) is provided in Appendix C by
showing positive semidefiniteness of the Hessian matrix. where U N is a Hermitian matrix of size N × N .
1488 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 66, NO. 6, MARCH 15, 2018

To see the equivalence of the two problems, note that in (22) while the frequencies {μ̂l }L̂l and corresponding magnitudes
ASAH + ÎťI M 0 is positive definite, since S 0 and Îť > 0.
{ŝl }L̂l can be estimated by Vandermonde decomposition ac-
Further consider the Schur complement of the constraint (22b),
[62]: cording to (14). With the frequencies in {μ̂l }L̂l and signal mag-
nitudes in {ŝl }L̂l , the corresponding signal matrix X̂ can be
UN Y H (ASAH + λI M )−1 Y , (23) reconstructed by application of (20).
We remark that
 unique Vandermonde decomposition requires
which implies that L̂ = rank Toep(û) < M . The rank L̂ can be interpreted
1 1 as the counterpart of the number of non-zero elements in the
Tr(U N ) ≥ Tr(Y H (ASAH + λI M )−1 Y ) minimizer Ŝ in the grid-based problems (22) and (25). Similarly
N N
as the regularization parameter Îť determines the number of non-
= Tr((ASAH + λI M )−1 R̂). (24) zero elements, i.e., the sparsity level of Ŝ, there always exists a
For any optimal point Ŝ of (19) we can construct a feasible value λ which yields a minimizer û of  the gridless formulations
point of (22) with the same objective function value by choos- (29) and (30) which fulfills L̂ = rank Toep(û) < M such that
ing U N = Y H (AŜAH + λI M )−1 Y . Conversely, any optimal a unique Vandermonde decomposition is obtained. We provide
solution pair Û N , Ŝ of (22) is also feasible for (19). a description for the appropriate choice of the regularization
Corollary 2: The SPARROW formulation in (19) admits the parameter Îť in Section VII.
equivalent problem formulation For using standard convex solvers we follow the ideas of
Corollary 1 to reformulate (26) as the SDP
min Tr(U M R̂) + Tr(S) (25a) 1  1 
S,U M
min Tr U N + Tr Toep(u) (29a)
u,U N N M
UM IM
s.t. 0 (25b) UN YH
IM ASAH + ÎťI M s.t. 0 (29b)
Y Toep(u) + ÎťI M
S ∈ D+ (25c)
Toep(u) 0. (29c)
where U M is a Hermitian matrix of size M × M .
The proof of Corollary 2 follows the same line of arguments Alternatively, using the approach of Corollary 2, we define the
as in the proof of Corollary 1. In contrast to the constraint (22b), gridless estimation problem
the dimension of the semidefinite constraint (25b) is indepen-  1 
dent of the number of MMVs N . It follows that either problem min Tr U M R̂ + Tr Toep(u) (30a)
u,U M M
formulation (22) or (25) can be selected to solve the SPARROW  
problem in (19), depending on the number of MMVs N and the UM IM
s.t. 0 (30b)
resulting dimension of the semidefinite constraint, i.e., (22) is I M Toep(u) + ÎťI M
preferable for N ≤ M and (25) is preferable otherwise. We re-
Toep(u) 0. (30c)
mark that the SDP implementations in [32] have been derived
using similar steps, i.e., employing the Schur complement to Comparing the GL-SPARROW formulation (29) and the
obtain linear matrix inequality constraints according to [62]. ANM problem (17) we observe a similar structure in the objec-
In the case of ULAs the steering matrix A has a Vandermonde tive functions and semidefinite constraints. In fact, both prob-
structure and the matrix product ASAH = Toep(u) forms a lems are equivalent as given by the following theorem:
Toeplitz matrix, as discussed in Section III-B. Based on the Theorem 2: The atomic norm minimization problem (16)
uniqueness of the Vandermonde decomposition as discussed for and the corresponding SDP implementation (17) with auxiliary
(14), we rewrite problem (19) as the gridless (GL-) SPARROW variable v, is equivalent to the gridless SPARROW formulation
formulation (29) in the sense that the corresponding minimizers are related
 1  by
min Tr (Toep(u) + λI M )−1 R + Tr Toep(u) (26a) √
u M û = v̂/ N . (31)
s.t. Toep(u) 0, (26b)
A proof of Theorem 2 is given in the Appendix B. For both
where we additionally make use of the identity problem formulations, GL-SPARROW (29) and ANM (17), the
spatial frequencies ν are encoded in the vectors û and v̂, as
1 1  found by Vandermonde decomposition (14), such that both for-
Tr(S) = Tr(ASAH ) = Tr Toep(u) , (27)
M M mulations provide the same frequency estimates.
with the factor 1/M resulting from a(ν)22 = M , for all ν ∈ However, from a computational viewpoint, in contrast to the
[−1, 1). Given a minimizer û of problem (26), the number of GL-SPARROW problem in (29), the ANM problem in (17)
sources, i.e., the model order, can be directly estimated as has additional M N complex variables in the matrix Y 0 , which
 need to be matched to the MMV matrix Y by an additional
L̂ = rank Toep(û) , (28) quadratic term in the objective function. We remark that the
STEFFENS et al.: COMPACT FORMULATION FOR THE  2 , 1 MIXED-NORM MINIMIZATION PROBLEM 1489

dimensionality reduction techniques for ANM, discussed in (τ )


in iteration τ is updated with the optimal stepsize dˆk , computed
Section III, can similarly be applied to SPARROW. Hence, as
the GL-SPARROW formulations (29) and (30) admit signifi- (τ )
cantly reduced computational cost as compared to the ANM dˆk = arg min f (S k ,τ + d E k ) (32a)
d
formulation (17).
(τ )
In [63] it was shown that the GL-SPARROW can similarly s.t. sk + d ≥ 0. (32b)
be applied to augmentable arrays, i.e., uniform linear arrays In (32), the diagonal matrix
with missing sensors in specific positions. As shown in [63], the
(τ +1) (τ +1) (τ ) (τ )
GL-SPARROW method outperforms state of the art methods S k ,τ = diag s1 , . . . , sk −1 , sk , . . . , sK (33)
for augmentable arrays in the case of coherent source signals.
Moreover, the SPARROW formulation (19) can be adapted to denotes the approximate solution for the minimizer of f (S) in
perform gridless frequency estimation in shift-invariant arrays, iteration τ , before updating coordinate k, and matrix E k with
as discussed in [64]. elements

While the above discussion considers gridless frequency re- 1 if m = n = k
construction by means of the primal SPARROW formulation [E k ]m ,n = (34)
0 else
(26) and its SDP reformulations (29) and (30), the gridless fre-
(τ )
quency reconstruction problem can also be addressed by means denotes a selection matrix. Given the update stepsize dˆk , the
of dual polynomials [14], [15], [17] as discussed in [65]. The coordinate update is performed according to
approach in [65] requires solving the dual problem of the SPAR- (τ )
S k ,τ +1 = S k ,τ + dˆk E k . (35)
ROW formulation in either (29) or (30), which forms a semidef-
inite program that can be solved by standard convex solvers. For Regarding the SPARROW problem in (19), the objective func-
the sake of brevity we omit a further discussion of frequency tion of the subproblem in (32) is given as
reconstruction via the dual problem and refer to [14], [15], [17],  −1

[65] for more details. f (S k ,τ + d E k ) = Tr (U k ,τ + d ak aH
k ) R̂ + Tr S k ,τ + d,
(36)
with ak = a(νk ) denoting the kth column of the M × K
V. COORDINATE DESCENT IMPLEMENTATION OF THE dictionary matrix A, computed from a fixed grid of fre-
SPARROW FORMULATION quencies ν1 , . . . , νK as discussed in Section III, and U k ,τ =
For sensor arrays with a large number of sensors M , the SDP AS k ,τ AH + λI M . Upon application of the matrix inversion
implementations in the previous section may become computa- lemma
tionally intractable, due to the large dimension of the semidefi- d U −1 H −1
k ,τ ak ak U k ,τ
nite matrix constraints. Similar observations have been made for (U k ,τ + d ak aH
k)
−1
= U −1
k ,τ − −1 (37)
1 + d aH
k U k ,τ ak
the gridless atomic norm minimization problem, which likewise
relies on an SDP implementation, such that in [18], [60] it was and by exploiting the cyclic property of the trace operator, equa-
suggested to avoid gridless estimation in the case of large sensor tion (36) can be rewritten as
arrays and to return to a grid-based implementation of SSR that
f (S k ,τ + d E k )
avoids SDP, instead.
−1 −1
A particularly simple algorithm for solving the 2,1 for-  d aHk U k ,τ R̂U k ,τ ak 
mulation (18) is the coordinate descent (CD) method [66], = Tr U −1
k ,τ R̂ − −1 + Tr S k ,τ + d.
1 + d ak U k ,τ ak
H
[67]. Its simplicity mainly lies in the closed-form and low-
(38)
complexity solutions for the coordinate updates. However, the
computational cost of the CD implementation of the conven- The function f (S k ,τ + d E k ) in (38) behaves asymptotically
tional 2,1 mixed norm minimization problem (18) increases linear in d and has stationary points in
with the number of MMVs N . On the other hand, ignoring 
−1 −1
the comparably small overhead required for computing the ± aH k U k ,τ R̂U k ,τ ak − 1
sample covariance matrix R̂, the computational cost of the d˜1,2 = −1 , (39)
aHk U k ,τ ak
SPARROW formulation in (19) is independent of the number
of MMVs N and, as we will show in this section, a simple symmetrically located around the simple pole in
CD implementation also exists for the SPARROW formulation (τ )−1
1 1 + sk aH k U −k ,τ ak
which does not involve an explicit matrix inversion per CD d˜0 = − −1 = − −1 < 0, (40)
iteration. aH
k U k ,τ ak aHk U −k ,τ ak
Consider a function f (S) which is jointly convex in the vari- where the last identity in (40) follows from the matrix inversion
ables s1 , . . . , sK . To be consistent with previous notation we
lemma applied to U −1 H −1
(τ )
k ,τ = (U −k ,τ + sk ak ak ) , with U −k ,τ
summarize the variables in the diagonal matrix S = diag(s1 ,
. . . , sK ). Furthermore, consider uncoupled constraints of the = A−k S −k ,τ AH −k + λI M , where A−k = [a1 , . . . , ak −1 , ak +1 ,
(τ +1) (τ +1) (τ ) (τ )
form sk ≥ 0, for k = 1, . . . , K. The CD method provides se- . . . , aK ] and S −k ,τ = diag(s1 , . . . , sk −1 , sk +1 , . . . , sK ).
(τ ) (τ )
quential and iterative coordinate updates, where coordinate sk Taking into account (40) and the constraint sk + d ≥ 0 in
1490 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 66, NO. 6, MARCH 15, 2018

per coordinate and iteration, such that it has a computational cost


Algorithm 1: Coordinate Descend Method.
comparable to that of the SPARROW formulation. However, as
1: Initialize approximate solution S 1,1 ← 0, we will show by numerical experiments in Section VII, the CD
2: Initialize matrix inverse U −1
1,1 ← λ I
1
implementation of the SPARROW formulation provides a better
3: Initialize iteration index τ ← 1 convergence rate.
4: repeat The basic convergence of the proposed SPARROW CD
5: for k ← 1, . . . , K do method is guaranteed by the following result:
(τ +1)
6: Compute stepsize dˆk by eq. (41) Proposition 1 ([68, Pr. 3.7.1]): Suppose that f (S) in (32) is
7: Update approximate solution S k ,τ +1 by eq. (35) continuously differentiable over the set S 0. Furthermore,
8: Update matrix inverse U −1 k +1,τ by eq. (37) suppose that for each S = diag(s1 , . . . , sK ) 0 and k,
9: end for
10: Update matrix inverse U −1 −1
1,τ +1 ← U K +1,τ f (s1 , . . . , sk −1 , ξ, sk +1 , . . . , sK) (43)
11: Update iteration index τ ← τ + 1
12: until convergence viewed as a function of ξ, attains a unique minimum ξˆ > 0, and
ˆ Let
is monotonically nonincreasing in the interval from sk to Ξ.
{S } be the sequence generated by the coordinate descent
(τ )

(32b), it can easily be verified that the optimal stepsize must method in Algorithm 1. Then, every limit point of {S (τ ) } is a
(τ ) (τ )
fulfill dˆk ≥ −sk > d˜0 , i.e., it must be located on the right stationary point.
The assumptions of the uniqueness of minimum and mono-
hand side of the pole d˜0 , and the optimal stepsize according to
tonic nonincrease of f (S) in Proposition 1 are satisfied for our
(32) is computed as
proposed approach because f (S) is strictly convex in each com-
⎛ ⎞
a H U −1 R̂U −1 a − 1 ponent when all other components are held fixed, as discussed
k k ,τ k ,τ k
dˆk = max ⎝ , −sk ⎠ .
(τ ) (τ )
−1 (41) in Appendix C.
aH
k U a
k ,τ k

(τ ) VI. RELATION TO THE SPICE METHOD


Given the stepsize dˆk , the variable update is performed ac-
cording to (35). The matrix inverse U −1k +1,τ , including the up-
The SParse Iterative Covariance-based Estimation (SPICE)
(τ +1) (τ ) ˆ(τ ) method [44]–[46] seeks to match the sample covariance matrix
dated coordinate s k = s + d as required for updating
k k R̂ = Y Y H /N with a sparse representation of the covariance
(τ )
the next coordinate sk +1 , can be computed by the matrix in- matrix R0 , as shortly reviewed in the following.
version lemma as given in (37), which requires O(M 2 ) op- The signal model Y = A(Ο)Ψ + N , as defined in (2), ad-
erations. This results in significantly reduced computational mits the covariance matrix
cost, compared to explicit computation of the matrix inverse
U −1
k +1,τ = (AS k +1,τ A + λI M )
H −1
which requires O(M 3 ) R = E{Y Y H }/N = A(Ο)ΌAH (Ο) + σ 2 I M . (44)
1
operations . The overall steps of our proposed CD method are
summarized in Algorithm 1. We remark that in a practical im- In contrast to our consideration, the authors in [44]–[46] explic-
plementation only the M × M Hermitian matrix U −1 itly assume that the signals in Ψ are uncorrelated, such that the
k +1,τ as
well as the diagonal elements in S k ,τ need to be stored and source covariance matrix
updated over coordinates k and iterations τ , and that the com- Φ = E{Ψ Ψ H }/N (45)
putation time of the CD method can be drastically reduced if
the sparsity in S k ,τ is exploited, by excluding zero elements has a diagonal structure, i.e., Φ = diag(φ1 , . . . , φL ). The sparse
in S k ,τ from the computation. The proposed CD implementa- representation R0 of the covariance matrix in (44) is introduced
tion of SPARROW can be implemented with about 3M 2 + 2M as
complex multiplications and additions per coordinate and iter-
ation. In the undersampled case, with N < M , the number of R0 = AP AH + I M , (46)
operations can be further reduced by replacing
 where A denotes the dictionary matrix computed for a fixed
−1 −1 H −1
√ grid of frequencies ν1 , . . . , νK , as used in (5), = σ 2 denotes
k U k ,τ R̂U k ,τ ak = ak U k ,τ Y 2 / N
aH (42)
the noise power and the elements of the sparse diagonal source
in the update stepsize computation (41). The CD implementation covariance matrix P = diag(p1 , . . . , pK ) ∈ D+ are given as
of the 2,1 mixed-norm minimization problem [66] requires in 
the order of 3M N + 2N complex multiplications and additions φl if νk = μl
pk = (47)
0 else,
1 To reduce the effect of numerical error propagation it is advisable to compute for k = 1, . . . , K and l = 1, . . . , L, with φl denoting the diag-
the matrix inverse U −1k ,τ
= (ASk , τ AH + λI M )−1 in closed form after a onal elements of the source covariance as defined in (45).
number of rank-one updates, depending on the variable precision and desired Two types of weighted covariance matching functions have
accuracy of the solution. From our experiments in Matlab with double precision
floating-point numbers we found that a closed form computation after every 100 been proposed in [44]–[46]. The undersampled case, with
CD iterations achieves good reconstruction performance. N < M , is treated by minimization of a weighted covariance
STEFFENS et al.: COMPACT FORMULATION FOR THE  2 , 1 MIXED-NORM MINIMIZATION PROBLEM 1491

matching function according to from those used in our proof of Theorem 1. Furthermore, there
 2  are some significant differences between the SR-LASSO for-
min R0 (R̂ − R0 )F : (46)
−1/2
P ∈D+ ,
mulation (52) and the standard mixed-norm formulation (18)
≥0 considered here. The latter reduces to the popular standard
     LASSO [2] in the special case of a single measurement vec-
= min Tr R−1
0 R̂
2
+ Tr R0 − 2Tr R̂ : (46) , tor. As compared to the SR-LASSO (52), the standard LASSO
P ∈D+ ,
≥0 has a squared data fitting term, such that for additive white
(48) Gaussian noise the standard LASSO admits an interpretation as
where sparsity in P is induced a Bayesian estimator with Laplacian priors [2], [70]. Equiva-
 in the objective of (48) in form
of the trace penalty term Tr R0 as can be observed from the lence of the standard LASSO and the SR-LASSO only holds in
following identity: the noise-free case, such that in this case the SPICE formula-
  tion in (48) is equivalent to standard 1 norm minimization. In
 K K
contrast to that, the SPARROW formulation is equivalent to the
Tr R0 = M + ak 2 ¡ pk = M
2
+ pk . (49)
standard mixed-norm minimization problem in the general and
k =1 k =1
practically relevant case of noisy measurements.
The oversampled case, with N ≥ M where the sample co- Another major difference of the mixed-norm minimization
variance matrix R̂ is non-singular, is treated by the minimization problem in (18) and the SR-LASSO formulation in (52) lies
of the weighted covariance matching function according to in the absence of the regularization parameter Îť in the latter
 2 
−1/2  approach. The mixed-norm problem (18) admits to obtain a so-
min R0 R̂ − R0 R̂−1/2 F : (46)
P ∈D+ , lution of any desired sparsity level by tuning the regularization
≥0 parameter λ, e.g., by exploiting a-priori knowledge or by apply-
   
ing blind techniques such as the cross validation approach of [2].
= min Tr R−1
0 R̂ + Tr R 0 R̂ −1
− 2M : (46) , (50)
P ∈D+ , The SR-LASSO in (52) does not have such a regularization pa-
≥0
rameter and thus provides less flexibility in the solution. On the
where sparsity in P is induced by summation of its diagonal other hand, since the selection of the regularization parameter
elements with data dependent nonnegative weights according to can be quite challenging in practice, this makes the SR-LASSO,
and correspondingly the SPICE method, easily applicable in
  
K
Tr R0 R̂−1 = Tr R̂−1 + aH −1
k R̂ ak · pk . (51) practical scenarios [44]–[46], [71].
k =1
A gridless extension of SPICE to the GridLess Spice (GLS)
method for ULAs was proposed in [32], which relies on an SDP
We remark that our proposed SPARROW formulation in (19)
formulation of the SPICE problems (48) and (50), and Vander-
exhibits similarities with both SPICE formulations (48) and
monde decomposition of Toeplitz matrices, similar to the ANM
(50). While the SPARROW formulation shares the uniformly
and SPARROW problems discussed in Sections III-B and IV.
weighted summation of its variables in Tr(S) with the SPICE
In [32], [72] it has been shown that GLS can be interpreted
formulation
 in (48), it shares the structure of the data fitting func-
as special versions of noise-free ANM (13). In contrast to the
tion Tr (ASAH + λI M )−1 R̂ with the SPICE formulation in
results in [32], [72], our results of equivalence between gridless
(50). There is, however, a fundamental difference between the
SPARROW and ANM in Section III-B hold in the more general
SPARROW formulation and the SPICE formulations in the fact
case with an additional data matching term in the ANM formu-
that the variables in S correspond to the normalized row-norms
lation to account for noise-corrupted measurements according
of the signal matrix, i.e., ŝk = √1N x̂k 2 , for k = 1, . . . , K, as
to (17).
seen from (21), while the variables in P correspond to the sig-
nal powers, i.e., p̂k = √1N E{x̂k 22 }, for k = 1, . . . , K, as seen
VII. NUMERICAL EXPERIMENTS
from (45) and (47).
Related links between SPICE and 2,1 mixed-norm minimiza- The parameter estimation performance of the 2,1 mixed-
tion have been presented, e.g., in [47], [48], where it has been norm minimization, ANM and SPICE has been numerically
shown that for the case of a single measurement vector y the investigated in various publications, e.g., [30]–[32], [37], [38],
SPICE problem in (48) is equivalent to the square-root LASSO [44]–[46]. Instead, we provide a comparison of the computation
(SR-LASSO) [69] time for the equivalent approaches discussed in this paper.
Regarding the choice of the regularization parameter in the
min Ax − y2 + x1 (52) SPARROW formulation, we follow the heuristic approach of
x
selecting
in the sense that the corresponding minimizers are related by

|x̂k | y2 λ = σ 2 M log M , (54)
x̂ = P̂ AH (AP̂ AH + ˆI)−1 y and p̂k = √ . (53)
M as suggested for the single measurement vector problem in [18],
Similarly, it was shown in [45] that the SPICE formulation in which has provided good estimation performance for the sce-
(50) is equivalent to a weighted SR-LASSO formulation. We narios investigated in this manuscript.
point out that the line of arguments used in [45], [47], [48] All simulations are performed in Matlab on a computer with
to prove the above mentioned equivalences is rather different an Intel Core i7-4770 CPU @ 3.40 GHz × 8 and 16 GByte
1492 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 66, NO. 6, MARCH 15, 2018

Fig. 3. Average CPU time for equivalent methods under with varying number Fig. 4. Average CPU time for equivalent methods under with varying number
of measurement vectors. of sensors.

RAM. For evaluation of the SDP reformulations of the SPAR- and (30) clearly outperform their grid-based counterparts (22)
ROW problem we employ the multi-purpose solver MOSEK and (25). Comparing the coordinate descent implementations of
[61] with the CVX MATLAB interface [73], [74]. For eval- the 2,1 mixed-norm minimization problem and the SPARROW
uation of the coordinate descent (CD) method proposed in formulation in Fig. 3, it can be seen that the 2,1 CD method
Section VII-C, we employ a C/C++ implementation of the CD has the highest computation time among all methods under con-
methods for the SPARROW formulation and the 2,1 mixed- sideration for all MMV numbers N ≤ 40, and the computation
norm minimization problem [66], respectively. To reduce the time increases with the number of MMVs, while the compu-
computational cost in both CD methods, zero coordinates tation time of the SPARROW CD implementation is slightly
(τ ) (τ )
sk 0 = 0 and xk 0 = 0 are excluded from computation in fu- lower than the grid-based MOSEK implementation and almost
ture iterations τ > τ0 . In all experiments, both CD methods are independent of the number of MMVs.
initialized with zero matrices, i.e., S (0) = 0 and X (0) = 0. We The experiment shows that all the methods employing the
assume that convergence is achieved for both CD methods if the raw measurements in Y , i.e., 2,1 mixed-norm minimization
relative change of the objective function value f (τ ) in iteration (18), the 2,1 CD method, the SPARROW formulation (22),
τ fulfills |f (τ ) − f (τ −1) |/f (τ ) ≤ 10−12 . the ANM formulation (17) and the GL-SPARROW formulation
(29), suffer from increased computation time in the case of large
A. Number of Measurement Vectors number of measurement vectors N , demonstrating the necessity
of dimensionality reduction techniques, as will be investigated
We consider a scenario with L = 3 independent complex in the following experiment.
Gaussian sources with static spatial frequencies μ1 = −0.1,
Îź2 = 0.35 and Îź3 = 0.5 and a ULA with M = 10 sensors. The
signal-to-noise ratio (SNR) is fixed at SNR = 10 dB while the B. Number of Sensors
number of MMVs N is varied. Fig. 3 shows the average CPU We keep the scenario from the previous section with L = 3
time of 2,1 mixed-norm minimization (18), the SPARROW for- source signals and fix the number of MMVs as N = 50 while
mulations (22) and (25), atomic norm minimization (ANM) (17) varying the number of sensors M in the ULA. Fig. 4 displays
and GL-SPARROW (29) and (30). For the grid-based methods the average CPU time for the various equivalent methods under
we use a grid of size K = 1000. investigation. To reduce the computational cost in the methods
Regarding the CPU time for the grid-based methods it can be based on the M × N raw measurement matrix Y we perform
seen that the SPARROW formulation (22) outperforms the 2,1 dimensionality reduction according to [49] to match a matrix
mixed-norm minimization (18) for N < 30 MMVs. For larger Y RD of dimensions M × M instead, as discussed in Section III.
number of MMVs the dimensions of the semidefinite constraint Using the dimensionality reduction technique, it can be seen
(22b) become too large such that the computational cost is in- that both grid-based SPARROW formulations (22) and (25)
creased as compared to 2,1 mixed-norm minimization (18). The have the same computational cost, since the dimensions of the
SPARROW formulation (25) is based on the sample covariance semidefinite constraints are identical. For M ≤ 18 sensors the
matrix and thus the computational cost is independent of the grid-based SPARROW formulations outperform the 2,1 mixed-
number of MMVs. For the gridless methods, Fig. 3 clearly dis- norm minimization (18). However, for M > 18 the dimensions
plays that the CPU time of the GL-SPARROW formulation (29) of the semidefinite constraints in the SPARROW formulations
is significantly reduced as compared to the ANM formulation become too large such that the computational cost exceeds that
(17). Similar as for the grid-based case, the CPU time of the of the 2,1 mixed-norm minimization (18).
covariance-based GL-SPARROW formulation (30) is relatively Similar as for the grid-based SPARROW, the gridless SPAR-
independent of the number of MMVs N and outperforms the ROW formulations (29) and (30) show identical performance,
other methods for large number of MMVs N . Independent of the due to the identical size of the semidefinite constraints. Both
number of MMVs, the gridless SPARROW formulations (29) gridless SPARROW formulations clearly outperform the ANM
STEFFENS et al.: COMPACT FORMULATION FOR THE  2 , 1 MIXED-NORM MINIMIZATION PROBLEM 1493

Fig. 5. Convergence rate of the coordinate descent implementations of SPARROW and 2 , 1 mixed-norm minimization for varying number of source signals L,
frequency grid points K , sensors M and MMVs N , as well as varying SNR, and the resulting runtimes tS P and t 2 , 1 of the SPARROW and 2 , 1 CD methods.

approach (17), especially for large number of sensors M . This mixed-norm minimization for large number of sensors, when
can be explained by the additional M 2 complex variables in the the SDP formulations are used with the MOSEK solver. To deal
matrix Y 0 of the ANM formulation (17). with this problem we have presented a low complexity coordi-
With respect to the CD implementations it can be observed nate descent (CD) implementation in Section V which exploits
from Fig. 4 that both, the 2,1 and the SPARROW CD method, the special structure of our proposed SPARROW formulation.
show high computation time for low number of sensors, which For evaluation of the proposed SPARROW CD method and
can be explained by a high correlation of the atoms in the dic- for comparison to the CD method for 2,1 mixed-norm mini-
tionary matrix A. With increasing number of sensors the atoms mization (2,1 CD) [66], Fig. 5 displays the convergence rate of
become less correlated and the computation time reduces to a the two methods for various scenarios in terms of the objective
constant value for the considered number of sensors, where the function value f (τ ) in iteration τ as compared to the optimum
computation time of the SPARROW CD method is significantly value fˆ of the objective function2 . To allow comparison with
lower than that of the 2,1 CD method. the computation of the SDP formulations in the previous sec-
tions, the corresponding runtimes are provided below the plots,
where tSP and t 2 , 1 denote the CPU time of the SPARROW and
C. Coordinate Descent Method
As seen in the last section the computational cost of 2 For better comparison of the two CD methods the objective function of the
the grid-based SPARROW formulations exceeds that of 2,1 scaled SPARROW formulation in (63) is used here.
1494 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 66, NO. 6, MARCH 15, 2018

the 2,1 CD methods, respectively. For all scenarios the sensor SPARROW formulation provides significant savings in com-
positions ρm , for m = 1, . . . , M , are selected uniformly at ran- putational cost as compared to 2,1 mixed-norm and atomic
dom in the interval [0, M ], while the spatial frequencies Îźl , for norm minimization, when applied in standard convex solvers
l = 1, . . . , L, are selected uniformly at random in the interval and coordinate descent methods.
[−1, 1), with a minimum spacing of minl,k |μl − μl | ≥ 0.02,
for l, k = 1, . . . , L with l = k. APPENDIX A
Figs. 5(a)–(c)) illustrate how the convergence rates of the EQUIVALENCE OF SPARROW AND
two CD methods reduce with increasing number of grid points, 2,1 MIXED-NORM MINIMIZATION
which can be explained by the corresponding increase in the
Proof of Theorem 1: A key component in establishing the
correlation of the atoms in the dictionary matrix A. The conver-
equivalence in equations (18) and (19) is the observation that
gence behavior for varying number of MMVs is illustrated in
the 2 norm of a vector xk can be rewritten as
Figs. 5(d)–(f)), showing that the convergence rate for both CD
methods is essentially independent from the number of MMVs. 1
xk 2 = min (|Îłk |2 + g k 22 ) (55a)
However, comparing the runtimes of both CD methods it can be Îł k ,g k 2
observed that the 2,1 CD method requires higher computation s.t. Îłk g k = xk , (55b)
time due to the increased number of operations as compared to
the SPARROW CD method. In contrast to that, Figs. 5(g)–(i)) where γk is a complex scalar and g k is a complex vector of
show that the convergence rates of the CD methods slightly dimension N × 1, similar to xk . For the optimal solution of
decrease with increasing SNR, especially for the 2,1 method. (55), it holds that
This effect can be explained by the corresponding change in
xk 2 = |Îłk |2 = g k 22 . (56)
the regularization parameters according to (54), where a higher
SNR results in smaller regularization parameter Îť, which in To see this, consider that any feasible solution must fulfill
turn causes a reduced convergence rate. The effect of varying 
1
number of sensors is displayed in Figs. 5(j)–(l), where it can xk 2 = |γk |2 g k 22 ≤ (|γk |2 + g k 22 ) (57)
be observed that the convergence rate improves with increasing 2
number of sensors. As discussed for Figs. 4 and 5(a)–(c)), this which constitutes the inequality of arithmetic and geometric
effect can be explained by the reduced correlation of the atoms means, with equality holding if and only if |Îłk | = g k 2 .
in the dictionary matrix A for larger number of sensors M at We can extend the idea in (55) to the 2,1 mixed-norm of
constant number of atoms K. Clearly, the results show that for the source signal matrix X = [x1 , . . . , xK ]T composed of rows
all scenarios SPARROW CD outperforms the 2,1 CD method, xk , for k = 1, . . . , K, by
in convergence rate as well as in runtime.

K
1
X2,1 = xk 2 = min (Γ 2F + G2F ) (58a)
Γ ∈D,G 2
VIII. CONCLUSION k =1

We have considered the classical 2,1 mixed-norm minimiza- s.t. X = Γ G, (58b)


tion problem for jointly sparse signal reconstruction from mul- where Γ = diag(γ1 , . . . , γK ) is a K × K complex diagonal
tiple measurement vectors and derived an equivalent, compact matrix and G = [g 1 , . . . , g K ]T is a K × N complex matrix
reformulation with significantly reduced problem dimension. with rows g k , for k = 1, . . . , K. After inserting (58) into the
In our compact reformulation, which we refer to as SPAR- 2,1 mixed-norm minimization problem in (18), we formulate
ROW (SPARse ROW norm reconstruction), the variables rep- the minimization problem
resent the row-norms of the jointly sparse signal representation, √
while the measurements are compactly represented in form of 1 2 Îť N
min AΓ G − Y F + (Γ 2F + G2F ). (59)
the sample covariance matrix. For the special case of uniform Γ ∈D,G 2 2
linear sampling we presented a gridless SPARROW imple-
For a fixed matrix Γ , the minimizer Ĝ of problem (59) admits
mentation and we have established exact equivalence between
the closed form expression
the gridless SPARROW formulation and the recently proposed
 √ −1 H H
atomic norm minimization problem for multiple measurement Ĝ = Γ H AH AΓ + λ N I K Γ A Y
vectors. However, in contrast to atomic norm minimization,  √ −1
our gridless SPARROW implementation shows reduced prob- = Γ H AH AΓ Γ H AH + λ N I M Y, (60)
lem size, resulting in significantly reduced computational cost.
where the last identity is derived from the matrix inversion
The proposed SPARROW formulations admit implementation
lemma. Reinserting the optimal matrix Ĝ into equation (59)
by semidefinite programming, which becomes computationally
and performing basic reformulations of the objective function
expensive in large sampling scenarios. To reduce the computa-
results in the compact minimization problem
tional cost in such large and possibly irregular sampling sce- √
narios we have presented a low complexity implementation of λ N  √ 
the SPARROW formulation by the coordinate descent method. min Tr (AΓ Γ HAH +λ N I M )−1 Y Y H +Tr Γ Γ H .
Γ ∈D 2
In our numerical evaluation we have demonstrated that the (61)
STEFFENS et al.: COMPACT FORMULATION FOR THE  2 , 1 MIXED-NORM MINIMIZATION PROBLEM 1495

Upon substituting Y Y H = N R̂ and defining the nonnegative the GL-SPARROW constraint (66b) we can rewrite
 √ 
diagonal matrix Û N / N √ YH √
√ Y N Toep(û) + λ N I M
S = Γ Γ H / N ∈ D+ (62)
 
V̂ N YH √1 Z H Z ZH
we can rewrite (61) as the problem = 0 + λ N √ 0. (70)
Y 0 Toep(v̂) Z λ N IM
ÎťN  
min Tr (ASAH + λI M )−1 R̂ +Tr S . (63) From (70) it can be seen that any minimizers (V̂ N , v̂, Ŷ 0 ) of
S∈D+ 2
the ANM problem (17), fulfilling constraint (17b), are feasible
Ignoring the factor ÎťN/2 in (63), we arrive at formulation (19). for the GL-SPARROW problem (66), since
From equation (56) and the definition of S = diag(s1 , . . . , sK ) √
H
1
√ Z HZ ZH √1 ZH √1 ZH
in (62) we furthermore conclude that λ N √ =λ N λ N λ N 0.
Z Îť N IM IM IM
1 (71)
sk = √ xk 2 , (64)
N In the second step we prove that the optimal point (û, Û N )
of (66) is feasible for the ANM problem (17). According to
for k = 1, . . . , K, as given by (21). Making further use of the
Corollary 1 we can assume w.l.o.g. that
factorization in (58b) we obtain  −1
Û N = Y H Toep(û) + λI M Y (72)
X̂ = Γ̂ Ĝ
√ is optimal for (66). Using (69) and (72) in (68) and solving for
= Γ̂ Γ̂ H AH (AΓ̂ Γ̂ H AH + λ N I M )−1 Y V̂ N results in
1  −1
= ŜAH (AŜAH + λI M )−1 Y (65) V̂ N = √ Y H Toep(û) + λI M Toep(û)
N
which is (20).   −1
× Toep(û) + λI M Y. (73)

APPENDIX B Considering constraint (17b) of the ANM problem and inserting


EQUIVALENCE OF SPARROW AND ANM (69) and (73) we see that
 −1
Proof of Theorem 2: Consider the GL-SPARROW formula- VN YH 1 Y H Toep(û) + λI M
0
=√ √
tion Y 0 Toep(v) N NI
λ  λN   −1 H
min Tr U N + Tr Toep(u) (66a) Y H Toep(û) + λI M
u,U N 2 2M ×Toep(û) √ 0,
√ NI
UN / N YH (74)
s.t. √ √ 0 (66b)
Y N Toep(u) + Îť N I M
i.e., from the minimizers (û, Û N ) of the GL-SPARROW prob-
Toep(u) 0, (66c) lem we can construct a feasible and optimal point (V̂ N , v̂, Ŷ 0 )
for the ANM formulation, which concludes the proof. 
and the ANM formulation in (17). Both problems are equivalent
in the sense that the minimizers are related by APPENDIX C
√ CONVEXITY OF THE SPARROW PROBLEM
v̂ = N û (67)
Consider the objective function of the SPARROW formula-
√ 1
Û N = N V̂ N + (Y − Ŷ 0 )H (Y − Ŷ 0 ) (68) tion in (19)
Îť
 −1
f (s) = Tr(Q−1 R̂) +1T s, (75)
Ŷ 0 = Toep(û) Toep(û) + λI M Y. (69)
where Q = A diag(s)AH + ÎťI, with s = [s1 , . . . , sK ]T , and
Inserting (67) and (68) into the objective function (66a) it can 1 is a vector of ones. Using the elementwise derivatives
easily be verified that both problems achieve the same minimum
∂ 
K
∂Q
value. It remains to show that the optimal point (Û N, û) of the = si ai aH
i + ÎťI = ak ak
H
(76)
GL-SPARROW formulation (66) is feasible for the ANM formu- ∂sk ∂sk i=1
lation (17), and, conversely, that the optimal point (V̂ N , v̂, Ŷ 0 )
∂Q−1 ∂Q −1
of the ANM formulation (17) is feasible for the GL-SPARROW = −Q−1 Q = −Q−1 ak aH
kQ
−1
(77)
∂sk ∂sk
formulation (66).
We first show that the optimal point (V̂ N , v̂, Ŷ 0 ) of the ANM the gradient of (75) is given as
formulation (17) is feasible for the GL-SPARROW formulation ∂f (s) 
(66). Defining Z = Y − Ŷ 0 and inserting (67) and (68) into = 1 − vecd AH Q−1 RQ−1 A , (78)
∂s
1496 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 66, NO. 6, MARCH 15, 2018

where vecd(X) denotes the vector containing the elements on [17] G. Tang, B. N. Bhaskar, P. Shah, and B. Recht, “Compressed sensing
the main diagonal of matrix X. The Hessian matrix of (75) is off the grid,” IEEE Trans. Inf. Theory, vol. 59, no. 11, pp. 7465–7490,
Nov 2013.
computed as [18] B. N. Bhaskar, G. Tang, and B. Recht, “Atomic norm denoising with
applications to line spectral estimation,” IEEE Trans. Signal Process.,
∂ 2 f (s)   vol. 61, no. 23, pp. 5987–5999, Dec. 2013.
= 2 Re (AH Q−1 A)T  (AH Q−1 RQ−1 A) , (79) [19] G. Tang, B. N. Bhaskar, and B. Recht, “Near minimax line spectral esti-
∂s ∂sT mation,” IEEE. Trans. Inf. Theory, vol. 61, no. 1, pp. 499–512, Jan. 2015.
with  denoting the Hadamard product, i.e., elementwise mul- [20] M. Yuan and Y. Lin, “Model selection and estimation in regression with
grouped variables,” J. Roy. Statist. Soc. Series B, Statist. Methodol., vol. 68,
tiplication. From the Schur product theorem [75] it can be con- no. 1, pp. 49–67, 2006.
cluded that the Hessian matrix in (79) is positive semidefinite, [21] J. A. Tropp, “Algorithms for simultaneous sparse approximation. Part II:
since for S = diag(s1 , . . . , sK ) 0 it holds that Q 0. In Convex relaxation,” Signal Process., vol. 86, no. 3, pp. 589–602, 2006.
[22] B. A. Turlach, W. N. Venables, and S. J. Wright, “Simultaneous variable
other words, the SPARROW formulation in (75), and (19), re- selection,” Technometrics, vol. 47, no. 3, pp. 349–363, 2005.
spectively, is convex for nonnegative diagonal matrices S. [23] M. Kowalski, “Sparse regression using mixed norms,” Appl. Comput.
Considering only a single component sk of the objective (75), Harmon. Anal., vol. 27, no. 3, pp. 303–324, 2009.
[24] J. A. Tropp, A. C. Gilbert, and M. J. Strauss, “Algorithms for simultaneous
we obtain the second order derivative sparse approximation. Part I: Greedy pursuit,” Signal Process., vol. 86,
no. 3, pp. 572–588, 2006.
∂ 2 f (s) −1 H −1 −1 [25] S. Cotter, B. Rao, K. Engan, and K. Kreutz-Delgado, “Sparse solutions to
k Q ak ) ¡ (ak Q RQ ak )
= 2 (aH (80) linear inverse problems with multiple measurement vectors,” IEEE Trans.
∂s2k
Signal Process., vol. 53, no. 7, pp. 2477–2488, Jul. 2005.
[26] Y. Jin and B. Rao, “Support recovery of sparse signals in the presence of
which is strictly greater than zero, i.e., the objective function multiple measurement vectors,” IEEE Trans. Inf. Theory, vol. 59, no. 5,
of the SPARROW formulation is strictly convex in its single pp. 3139–3157, May 2013.
components. [27] J. Chen and X. Huo, “Theoretical results on sparse representations of
multiple-measurement vectors,” IEEE Trans. Signal Process., vol. 54,
no. 12, pp. 4634–4643, Dec. 2006.
REFERENCES [28] M.-J. Lai and Y. Liu, “The null space property for sparse recovery from
multiple measurement vectors,” Appl. Comput. Harmon. Anal., vol. 30,
[1] C. Steffens, M. Pesavento, and M. Pfetsch, “A compact formulation for no. 3, pp. 402–406, 2011.
the L21 mixed-norm minimization problem,” in Proc. IEEE Int. Conf. [29] M. Davies and Y. Eldar, “Rank awareness in joint sparse recovery,” IEEE
Acoust., Speech Signal Process., Mar. 2017, pp. 1–5. Trans. Inf. Theory, vol. 58, no. 2, pp. 1135–1146, Feb. 2012.
[2] R. Tibshirani, “Regression shrinkage and selection via the LASSO,” [30] Y. Li and Y. Chi, “Off-the-grid line spectrum denoising and estimation
J. Roy. Statist. Soc. Series B, Methodological, vol. 58, pp. 267–288, with multiple measurement vectors,” IEEE Trans. Signal Process., vol. 64,
1996. no. 5, pp. 1257–1269, Mar. 2016.
[3] S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decomposition [31] Z. Yang and L. Xie, “Exact joint sparse frequency recovery via optimiza-
by basis pursuit,” SIAM J. Sci. Comput., vol. 20, pp. 33–61, 1998. tion methods,” IEEE Trans. Signal Process., vol. 64, no. 19, pp. 5145–
[4] D. Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory, vol. 52, no. 4, 5157, Oct. 2016.
pp. 1289–1306, Apr. 2006. [32] Z. Yang and L. Xie, “On gridless sparse methods for line spectral estima-
[5] E. Candès, J. Romberg, and T. Tao, “Robust uncertainty principles: Ex- tion from complete and incomplete data,” IEEE Trans. Signal Process.,
act signal reconstruction from highly incomplete frequency information,” vol. 63, no. 12, p. 3139–3153, Jun. 2015.
IEEE Trans. Inf. Theory, vol. 52, no. 2, pp. 489–509, Feb. 2006. [33] H. Krim and M. Viberg, “Two decades of array signal processing research:
[6] E. Candès and T. Tao, “Decoding by linear programming,” IEEE Trans. The parametric approach,” IEEE Signal Process. Mag., vol. 13, no. 4,
Inf. Theory, vol. 51, no. 12, pp. 4203–4215, Dec. 2005. pp. 67–94, Jul. 1996.
[7] E. J. Candès, J. K. Romberg, and T. Tao, “Stable signal recovery from [34] H. L. van Trees, Optimum Array Processing: Part IV of Detection,
incomplete and inaccurate measurements,” Commun. Pure Appl. Math., Estimation, and Modulation Theory. New York, NY, USA: Wiley,
vol. 59, no. 8, pp. 1207–1223, Aug. 2006. 2002.
[8] E. J. Candès and J. Romberg, “Quantitative robust uncertainty principles [35] R. Schmidt, “Multiple emitter location and signal parameter estimation,”
and optimally sparse decompositions,” Found. Comput. Math., vol. 6, IEEE Trans. Antennas Propag., vol. 34, no. 3, pp. 276–280, Mar. 1986.
no. 2, pp. 227–254, 2006. [36] P. Stoica and N. Arye, “MUSIC, maximum likelihood, and cramer-rao
[9] D. L. Donoho and M. Elad, “Optimally sparse representation in gen- bound,” IEEE Trans. Acoust., Speech, Signal Process., vol. 37, no. 5,
eral (nonorthogonal) dictionaries via 1 minimization,” Nat. Acad. Sci., pp. 720–741, May 1989.
vol. 100, no. 5., pp. 2197–2202, 2003. [37] D. Malioutov, M. Çetin, and A. Willsky, “A sparse signal reconstruction
[10] J. Tropp, J. Laska, M. Duarte, J. Romberg, and R. Baraniuk, “Beyond perspective for source localization with sensor arrays,” IEEE Trans. Signal
Nyquist: Efficient sampling of sparse bandlimited signals,” IEEE Trans. Process., vol. 53, no. 8, pp. 3010–3022, Aug. 2005.
Inf. Theory, vol. 56, no. 1, pp. 520–544, Jan. 2010. [38] M. M. Hyder and K. Mahata, “Direction-of-arrival estimation using a
[11] S. Mallat and Z. Zhang, “Matching pursuits with time-frequency dictio- mixed 2 , 0 norm approximation,” IEEE Trans. Signal Process., vol. 58,
naries,” IEEE Trans. Signal Process., vol. 41, no. 12, pp. 3397–3415, no. 9, pp. 4646–4655, Sep. 2010.
Dec. 1993. [39] J. Kim, O. K. Lee, and J. C. Ye, “Compressive MUSIC: A missing link
[12] J. Tropp and A. Gilbert, “Signal recovery from random measurements via between compressive sensing and array signal processing,” IEEE Trans.
orthogonal matching pursuit,” IEEE Trans. Inf. Theory, vol. 53, no. 12, Inf. Theory, vol. 58, no. 1, pp. 278–301, Jan. 2012.
pp. 4655–4666, Dec. 2007. [40] J. A. Högbom, “Aperture synthesis with a non-regular distribution of in-
[13] D. Needell and J. Tropp, “CoSaMP: Iterative signal recovery from in- terferometer baselines,” Astron. Astrophys. Suppl. Series, vol. 15, pp. 417–
complete and inaccurate samples,” Appl. Comput. Harmon. Anal., vol. 26, 426, Jun. 1974.
no. 3, pp. 301–321, 2009. [41] I. Gorodnitsky and B. Rao, “Sparse signal reconstruction from limited
[14] E. J. Candès and C. Fernandez-Granda, “Super-resolution from noisy data using focuss: A re-weighted minimum norm algorithm,” IEEE Trans.
data,” J. Fourier Anal. Appl., vol. 19, no. 6, pp. 1229–1254, 2013. Signal Process., vol. 45, no. 3, pp. 600–616, Mar. 1997.
[15] E. J. Candès and C. Fernandez-Granda, “Towards a mathematical theory of [42] L. Blanco and M. Najar, “Sparse covariance fitting for direction of arrival
super-resolution,” Commun. Pure Appl. Math., vol. 67, no. 6, pp. 906–956, estimation,” EURASIP J. Adv. Signal Process., vol. 2012, no. 1, 2012,
2014. Art. no. 111.
[16] V. Chandrasekaran, B. Recht, P. A. Parrilo, and A. S. Willsky, “The convex [43] J. Zheng and M. Kaveh, “Sparse spatial spectral estimation: A covariance
geometry of linear inverse problems,” Found. Comput. Math., vol. 12, fitting algorithm, performance and regularization,” IEEE Trans. Signal
no. 6, pp. 805–849, 2012. Process., vol. 61, no. 11, pp. 2767–2777, Jun. 2013.
STEFFENS et al.: COMPACT FORMULATION FOR THE  2 , 1 MIXED-NORM MINIMIZATION PROBLEM 1497

[44] P. Stoica, P. Babu, and J. Li, “New method of sparse parameter estimation [69] A. Belloni, V. Chernozhukov, and L. Wang, “Square-root lasso: Pivotal
in separable models and its use for spectral analysis of irregularly sampled recovery of sparse signals via conic programming,” Biometrika, vol. 98,
data,” IEEE Trans. Signal Process., vol. 59, no. 1, pp. 35–47, Jan. 2011. no. 4, pp. 791–806, 2011.
[45] P. Stoica, P. Babu, and J. Li, “SPICE: A sparse covariance-based estimation [70] T. Park and G. Casella, “The Bayesian Lasso,” J. Amer. Statist. Assoc.,
method for array processing,” IEEE Trans. Signal Process., vol. 59, no. 2, vol. 103, no. 482, pp. 681–686, 2008.
pp. 629–638, Feb. 2011. [71] P. Stoica and P. Babu, “SPICE and LIKES: Two hyperparameter-free
[46] P. Stoica, D. Zachariah, and J. Li, “Weighted SPICE: A unifying approach methods for sparse-parameter estimation,” Signal Process., vol. 92, no. 7,
for hyperparameter-free sparse estimation,” Digit. Signal Process., vol. 33, pp. 1580–1590, 2012.
pp. 1–12, 2014. [72] Z. Yang and L. Xie, “On gridless sparse methods for multi-snapshot DOA
[47] C. Rojas, D. Katselis, and H. Hjalmarsson, “A note on the SPICE method,” estimation,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process.,
IEEE Trans. Signal Process., vol. 61, no. 18, pp. 4545–4551, Sep. 2013. 2016, pp. 3236–3240.
[48] P. Babu and P. Stoica, “Connection between SPICE and square-root [73] M. Grant and S. Boyd, “Graph implementations for nonsmooth convex
LASSO for sparse parameter estimation,” Signal Process., vol. 95, pp. 10– programs,” in Recent Advances in Learning and Control, ser. Lecture
14, 2014. Notes in Control and Information Sciences, V. Blondel, S. Boyd, and H.
[49] Z. Yang and L. Xie, “Enhancing sparsity and resolution via reweighted Kimura, Eds., New York, NY, USA: Springer-Verlag, 2008, pp. 95–110.
atomic norm minimization,” IEEE Trans. Signal Process., vol. 64, no. 4, [74] M. Grant and S. Boyd, “CVX: Matlab software for disciplined convex
pp. 995–1006, Feb. 2016. programming, version 2.1,” http://cvxr.com/cvx, Mar. 2014.
[50] Y. Chi, L. Scharf, A. Pezeshki, and A. Calderbank, “Sensitivity to basis [75] J. Horn and C. Johnson, Matrix Analysis. Cambridge, U.K.: Cambridge
mismatch in compressed sensing,” IEEE Trans. Signal Process., vol. 59, Univ. Press, 1990.
no. 5, pp. 2182–2195, May 2011.
[51] M. A. Herman and T. Strohmer, “General deviants: An analysis of per-
Christian Steffens received the Dipl.Ing. degree in
turbations in compressed sensing,” IEEE J. Sel. Topics Signal Process.,
electrical engineering from the University of Bremen,
vol. 4, no. 2, pp. 342–349, 2010.
Bremen, Germany, in 2010. From 2010 to 2016, he
[52] Z. Yang, J. Li, P. Stoica, and L. Xie, “Sparse methods for direction-
has held a research position with the Communication
of-arrival estimation,” in Academic Press Library in Signal Processing-
Systems Group, Technical University of Darmstadt,
Array, Radar and Communications Engineering, 1st ed., S. Theodoridis
Darmstadt, Germany, where his research focused on
and R. Chellappa, Eds. New York, NY, USA: Academic, Oct. 2017, vol. 7,
sparse signal reconstruction, parameter estimation,
ch. 11.
array processing, and sensor networks. Since 2017,
[53] C. Carathéodory, “Über den Variabilitätsbereich der Fourierschen Kon-
he has been working with Telespazio VEGA, Darm-
stanten von positiven harmonischen Funktionen,” Rendiconti del Circolo
stadt, Germany. He was the recipient of a Student
Matematico di Palermo (1884–1940), vol. 32, no. 1, pp. 193–217, 1911.
Best Paper Award at the IEEE Sensor Array and Mul-
[54] C. Carathéodory and L. Fejér, “Über den Zusammenhang der extremen von
tichannel Signal Processing Workshop (SAM) 2014.
harmonischen Funktionen mit ihren Koeffizienten und über den Picard-
Landauschen Satz,” Rendiconti del Circolo Matematico di Palermo (1884–
1940), vol. 32, no. 1, pp. 218–239, 1911. Marius Pesavento received the Dipl.Ing. and M.Eng.
[55] O. Toeplitz, “Zur Theorie der quadratischen und bilinearen Formen von degrees from Ruhr-University Bochum, Bochum,
unendlich vielen Veränderlichen,” Mathematische Annalen, vol. 70, no. 3, Germany, and McMaster University, Hamilton, ON,
pp. 351–376, 1911. Canada, in 1999 and 2000, respectively, and the
[56] G. de Prony, “Essai expérimental et analytique: sur les lois de la dilatabilité Dr. Ing. degree in electrical engineering from Ruhr-
des fluides élastiques et sur celles de la force expansive de la vapeur de University Bochum, in 2005. Between 2005 and
l’eau et de la vapeur de l’alcool à différentes températures,” J. de l’École 2009, he held research positions in two start-up com-
Polytechnique, vol. 1, no. 22, pp. 24–76, 1795. panies in the ICT area. In 2010, he became an Assis-
[57] Y. Hua and T. K. Sarkar, “Matrix pencil method for estimating parame- tant Professor of robust signal processing and a Full
ters of exponentially damped/undamped sinusoids in noise,” IEEE Trans. Professor of communication systems, in 2013, with
Acoust., Speech, Signal Process., vol. 38, no. 5, pp. 814–824, May 1990. the Department of Electrical Engineering and Infor-
[58] D. W. Tufts and R. Kumaresan, “Estimation of frequencies of multiple mation Technology, Technical University Darmstadt, Darmstadt, Germany. His
sinusoids: Making linear prediction perform like maximum likelihood,” research interests include robust signal processing and adaptive beamforming,
Proc. IEEE, vol. 70, no. 9, pp. 975–989, Sep. 1982. high-resolution sensor array processing, multiantenna and multiuser communi-
[59] Z. Yang and L. Xie, “Continuous compressed sensing with a single or cation systems, distributed, sparse, and mixed-integer optimization techniques
multiple measurement vectors,” in Proc. IEEE Workshop Statist. Signal for signal processing, communications and machine learning, statistical signal
Process., Jun. 2014, pp. 288–291. processing, spectral analysis, and parameter estimation. He was the recipient
[60] G. Tang, B. Bhaskar, and B. Recht, “Sparse recovery over continu- of the 2003 ITG/VDE Best Paper Award, the 2005 Young Author Best Paper
ous dictionaries-just discretize,” in Proc. Asilomar Conf. Signals, Syst. Award of the IEEE TRANSACTIONS ON SIGNAL PROCESSING, and the 2010 Best
Comput., Nov. 2013, pp. 1043–1047. Paper Award of the CrownCOM conference. He is a Member of the Editorial
[61] MOSEK ApS, The MOSEK optimization toolbox for MATLAB manual. Board for the EURASIP Signal Processing Journal, and served as an Associate
Version 7.1 (Revision 28), 2015. [Online]. Available: http://docs.mosek. Editor for the IEEE TRANSACTIONS ON SIGNAL PROCESSING in 2012–2016. He
com/7.1/toolbox/index.html is a Member of the Sensor Array and Multichannel Technical Committee of the
[62] L. Vandenberghe and S. Boyd, “Semidefinite programming,” SIAM Rev., IEEE Signal Processing Society, and the Special Area Teams “Signal Processing
vol. 38, no. 1, pp. 49–95, 1996. for Communications and Networking” and “Signal Processing for Multisensor
[63] W. Suleiman, C. Steffens, A. Sorg, and M. Pesavento, “Gridless com- Systems” of the EURASIP.
pressed sensing for fully augmentable arrays,” in Proc. 25th Eur. Signal
Process. Conf., Kos Island, Greece, Sep. 2017, pp. 1986–1990.
[64] C. Steffens, W. Suleiman, A. Sorg, and M. Pesavento, “Gridless com- Marc E. Pfetsch received the Diploma degree in
pressed sensing under shift-invariant sampling,” in Proc. IEEE Int. Conf. mathematics from the University of Heidelberg, Hei-
Acoust., Speech Signal Process., Mar. 2017, pp. 4735–4739. delberg, Germany, in 1997, the Ph.D. degree in math-
[65] C. Steffens and M. Pesavento, “Block- and rank-sparse recovery for di- ematics, in 2002, and the Habilitation degree, in 2008,
rection finding in partly calibrated arrays,” IEEE Trans. Signal Process., from Technische Universität (TU) Berlin, Berlin,
vol. 66, no. 2, pp. 384–399, Jan. 2018. Germany. From 2008 to 2012, he was a Full Pro-
[66] Z. Qin, K. Scheinberg, and D. Goldfarb, “Efficient block-coordinate de- fessor of mathematical optimization with TU Braun-
scent algorithms for the group LASSO,” Math. Program. Comput., vol. 5, schweig, Braunschweig, Germany. Since April 2012,
no. 2, pp. 143–169, 2013. he has been a Full Professor of discrete optimiza-
[67] S. J. Wright, “Coordinate descent algorithms,” Math. Program., vol. 151, tion with TU Darmstadt, Darmstadt, Germany. His
no. 1, pp. 3–34, 2015. research interests include discrete optimization, in
[68] D. Bertsekas, Nonlinear Programming, 3rd ed. Belmont, MA, USA: particular symmetry in integer programs, compressed sensing, and algorithms
Athena Scientific, 2016. for mixed integer programs.

You might also like