Scalable Solvers of Random Quadratic Equations Via Stochastic Truncated Amplitude Flow
Scalable Solvers of Random Quadratic Equations Via Stochastic Truncated Amplitude Flow
Abstract—A novel approach termed stochastic truncated am- limitations of optical detectors such as photosensitive films,
plitude flow (STAF) is developed to reconstruct an unknown charge-coupled device (CCD) cameras, and human eyes, one
n-dimensional real-/complex-valued signal x from m “phaseless” records only the intensity of light (which describes the abso-
quadratic equations of the form ψi = |ai , x|. This problem,
also known as phase retrieval from magnitude-only information, is lute counts of photons or electrons that strike the detectors) but
NP-hard in general. Adopting an amplitude-based nonconvex for- loses the phase (where the wave peaks and troughs lie) [5]. It
mulation, STAF leads to an iterative solver comprising two stages: is known that when collecting the diffraction pattern at a large
s1) Orthogonality-promoting initialization through a stochastic enough distance from the imaging plane, the field is given by
variance reduced gradient algorithm; and, s2) a series of iter-
the Fourier transform of the image (up to a known phase fac-
ative refinements of the initialization using stochastic truncated
gradient iterations. Both stages involve a single equation per iter- tor). Therefore, those optical devices in the far field essentially
ation, thus rendering STAF a simple, scalable, and fast approach measure only the squared modulus of the Fourier transform of
amenable to large-scale implementations that are useful when n is the object, whereas the phase of the incident light reaching the
large. When {ai }m i= 1 are independent Gaussian, STAF provably detector is missing. Nevertheless, very much information is con-
recovers exactly any x ∈ R n exponentially fast based on order
tained in the Fourier phase. It has been well documented that
of n quadratic equations. STAF is also robust in the presence
of additive noise of bounded support. Simulated tests involving the Fourier phase of an image encodes often more structural in-
real Gaussian {ai } vectors demonstrate that STAF empirically re- formation than its Fourier magnitude [6]. Recovering the phase
constructs any x ∈ R n exactly from about 2.3n magnitude-only from magnitude-only measurements is thus of paramount prac-
measurements, outperforming state-of-the-art approaches and tical relevance. Further details concerning recent advances in
narrowing the gap from the information-theoretic number of equa-
the theory and practice of phase retrieval can be found in the
tions m = 2n − 1. Extensive experiments using synthetic data
and real images corroborate markedly improved performance of review [5].
STAF over existing alternatives. Succinctly stated, the generalized phase retrieval amounts to
solving a system of “phaseless” quadratic equations taking the
Index Terms—Nonconvex optimization, phase retrieval, vari-
ance reduction, Kaczmarz algorithm. form
ψi = |ai , x| , 1≤i≤m (1)
I. INTRODUCTION
ONSIDER the fundamental problem of reconstructing a where x ∈ R n or C n is the wanted unknown, ai ∈ R n or C n
C general signal vector from magnitude-only measurements,
e.g., the magnitude of the Fourier transform or any linear trans-
are known sensing/feature vectors, and ψ := [ψ1 · · · ψm ]T is
the observed data vector. Equivalently, (1) can also be given in
form of the signal. This problem, also known as phase retrieval its squared form as yi = |ai , x|2 , where yi := ψi2 denotes the
[1], arises in many fields of science and engineering ranging intensity or the squared modulus.
from X-ray crystallography [2], optics [3], as well as coherent In the classical discretized one-dimensional (1D) phase re-
diffraction imaging [4]. In such settings, due to the physical trieval, the amplitude vector ψ corresponds to the m-point (typ-
ically, m = 2n − 1) Fourier transform of the length-n signal x
Manuscript received September 15, 2015; revised December 15, 2016; ac- [5]. It has been established using the fundamental theorem of al-
cepted January 5, 2017. Date of publication January 16, 2017; date of current gebra that there is no unique solution in the discretized 1D phase
version February 7, 2017. The associate editor coordinating the review of this retrieval, even if one fixes trivial ambiguities resulting from op-
manuscript and approving it for publication was Dr. Yue Rong. This work was
supported in part by NSF under Grant 1500713 and Grant 1514056. erations that preserve Fourier magnitudes, including the global
G. Wang is with the Digital Technology Center and the Electrical and Com- phase shift, conjugate inversion, and spatial shift [7]. In fact,
puter Engineering Department, University of Minnesota, Minneapolis, MN there are up to 2n −2 generally distinct signals with common
55455 USA and also with the School of Automation, Beijing Institute of Tech-
nology, Beijing 100081, China (e-mail: [email protected]). ψ beyond trivial ambiguities [7]. To overcome this ill-posed
G. B. Giannakis is with the Digital Technology Center and the Electrical and character of the 1D phase retrieval, different approaches have
Computer Engineering Department, University of Minnesota, Minneapolis, MN been suggested. Additional constraints on the unknown sig-
55455 USA (e-mail: [email protected]).
J. Chen is with the School of Automation and State Key Laboratory of Intelli- nal such as sparsity or non-negativity are enforced in [8]–[10]
gent Control and Decision of Complex Systems, Beijing Institute of Technology, and [12]–[15]. Other viable options include introducing spe-
Beijing 100081, China (e-mail: [email protected]). cific redundancy into measurements leveraging, for example, the
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. short-time Fourier transform [5], [16], or masks [17], or simply
Digital Object Identifier 10.1109/TSP.2017.2652392 assuming random measurements (e.g., random Gaussian {ai }
1053-587X © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on September 09,2023 at 15:26:49 UTC from IEEE Xplore. Restrictions apply.
1962 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 65, NO. 8, APRIL 15, 2017
designs) [1], [12], [18], [19]. For analytic concreteness, we will A. Prior Art
henceforth assume random measurements ψi that are collected
Adopting the least-squares criterion (which would coincide
from the real-valued Gaussian model (1), with independently with the maximum likelihood one when assuming an additive
and identically distributed (i.i.d.) ai ∼ N (0, I n ). To demon-
white Gaussian noise model), the task of tackling the quadratic
strate the effectiveness of our proposed algorithm, experimental
system in (1) can be reformulated as that of minimizing the
implementation for the complex-valued Gaussian model with following amplitude-based empirical loss [9], [12], [19]
i.i.d. ai ∼ CN (0, I n ) := N (0, I n /2) + jN (0, I n /2), and us-
m
ing real images will be included as well. 1 2
minimize ψi − |aH
i z| (2)
It has been recently proved that when m ≥ 2n − 1 or m ≥ n
z∈C 2m i=1
4n − 4 generic measurements (e.g., from the Gaussian mod-
els) are acquired, the system in (1) determines uniquely an or, the intensity-based one [1]
n-dimensional real- or complex-valued x (up to a global sign 1
m
2 2
or phase) [20], [21], respectively. In the real case, m = 2n − 1 minimize yi − |aH
i z| (3)
z∈Cn 2m i=1
generic measurements are also proved necessary for uniqueness
[20]. Postulating existence of a unique solution x, our goal is to and its counterpart for Poisson data [18]
devise simple yet effective algorithms amenable to large-scale m
implementation: i) that provably reconstruct x from a near- 1 H 2
minimize |a z| − yi log |aH
i z| .
2
(4)
optimal number of quadratic equations as in (1); and ii), that z∈C n 2m i=1 i
feature in simultaneously low per-iteration and computational
Unfortunately, the three objective functions are nonconvex be-
complexities as well as linear convergence rate.
cause of the modulus in (2), or the quadratic terms in (3) and
Being a particular instance of nonconvex quadratic program-
(4). It is well known that nonconvex functions may exhibit many
ming, the problem of solving quadratic equations subsumes as
stationary points, and minimizing nonconvex objectives is in
special cases various classical combinatorial optimization tasks
general NP-hard, and hence computationally intractable [24]. It
involving Boolean variables (e.g., the NP-complete stone prob-
is worth stressing that it is difficult to establish convergence to a
lem [22, Section 3.4.1], [18]). Considering for instance real-
local minimum due to the existence of complicated saddle point
valued vectors ai and x, this problem boils down to assigning
structures [24]–[26].
signs si = ±1, such that the solution to the system of linear
√ Past approaches for solving quadratic equations can be
equations ai , x = si yi , denoted by z, adheres to the given
grouped in two categories: convex and nonconvex ones. The
equations |ai , z| = ψi , 1 ≤ i ≤ m. It is clear that there are a
nonconvex ones include the “workhorse” alternating projec-
total of 2m different combinations of {si }mi=1 , whereas only two
tion algorithms [9], [27]–[29], AltMinPhase [12] and TAF [14],
combinations of these signs leads to x up to a global sign. The
[15], [19], [30], trust-region [31] and majorization-minimization
complex scenario becomes even more complicated, in which
[32], [33], as well as the recently proposed Wirtinger-based
instead of assigning a series of signs {si }m i=1 , one looks for a
variants such as (truncated) Wirtinger flow (WF/TWF) [1],
collection of unimodular complex constants {σi ∈ C}m i=1 such
[18], [34]. Based on STFT measurements, gradient descent-type
that the resulting linear system and the original quadratic sys-
algorithms starting with a least-squares initialization provably
tem are equivalent. Furthermore, solving quadratic equations
recover the signal from magnitude-only information under ap-
has also found applications in estimating the mixture of lin-
propriate conditions [16]. Stochastic or incremental counter-
ear regressions, in which the latent membership variables are
parts consisting of Kaczmarz and ITWF have been reported
viewed as the missing phases [23]. Despite its practical rel-
too [35], [36]. On the other hand, the convex alternatives typi-
evance across various science and engineering fields, solving
cally rely upon the so-called matrix-lifting technique to derive
systems of quadratic equations is combinatorial in nature, and
semidefinite programming-based solvers such as PhaseLift [37],
NP-hard in general.
PhaseCut [38], and CoRK [39]. For the Gaussian model, com-
Notation: Lower- (upper-) case boldface letters denote col-
parisons between convex and nonconvex solvers in terms of
umn vectors (matrices), and calligraphic symbols are reserved
sample complexity and computational complexity to acquire an
for sets. The symbol T (H) stands for transposition (con-
-accurate solution are listed in Table I.
jugate transposition), and for positive semidefinite matri-
ces. For vectors, · signifies the Euclidean norm, and · 1
denotes the 1 -norm. The symbol · is the ceiling opera- B. This Paper
tion that returns the smallest integer greater than or equal Adopting the amplitude-based nonconvex formulation, this
to the given number. For a given function g(n) of integer paper puts forth a new algorithm, referred to as stochastic trun-
n > 0, Θ(g(n)) denotes the set of functions Θ(g(n)) = {f (n) : cated amplitude flow (STAF). STAF offers an iterative algo-
there exist positive constants C1 , C2 , and n0 such that 0 ≤ rithm that builds upon but considerably broadens the scope of
C1 g(n) ≤ f (n) ≤ C2 g(n) for all n ≥ n0 }; and likewise, TAF [19]. Specifically, it operates in two stages: Stage one em-
O(g(n)) = {f (n) : there exist positive constants C and n0 ploys a stochastic variance reduced gradient algorithm to obtain
such that 0 ≤ f (n) ≤ Cg(n) for all n ≥ n0 }, and Ω(g(n)) = an orthogonality-promoting initialization, whereas the second
{f (n) : there exist positive constants C and n0 such that 0 ≤ stage applies stochastic truncated amplitude-based iterations to
Cg(n) ≤ f (n)for all n ≥ n0 }. refine the initial estimate. Our approach is shown capable of
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on September 09,2023 at 15:26:49 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: SCALABLE SOLVERS OF RANDOM QUADRATIC EQUATIONS VIA STAF 1963
TABLE I
COMPARISONS OF DIFFERENT ALGORITHMS
reconstructing any n-dimensional real-/complex-valued signal solution set becomes {xeiφ , ∀φ ∈ [0, 2π)}. This prompts the
x from a nearly minimal number of magnitude-only measure- following definition of the Euclidean distance of any estimate z
ments in linear time. Relative to TAF, the present paper’s STAF to the solution set of (1): dist(z, x) := min z ± x for real-
is well suited for large-scale applications. Besides achieving valued signals, and dist(z, x) := minimizeφ∈[0,2π ) z − xeiφ
order-optimal sample and computational complexities, STAF for complex ones [1]. Define also the indistinguishable global
enjoys O(n) per-iteration complexity in both initialization and phase constant in the real case as
refinement stages, which not only improves upon state-of-the-
art alternatives that can afford O(n2 ), but it is also order op- 0, z−x ≤ z+x ,
φ(z) := (5)
timal. This makes STAF applicable and appealing to common π, otherwise.
large-scale imaging phase retrieval settings. Although ITWF
adopts an incremental gradient method to achieve O(n) per- Henceforth, letting x be any solution of the given system in (1),
iteration complexity at the second stage, its first stage relies we assume that φ (z) = 0; otherwise, z is replaced by e−j φ(z) z,
on the gradient-type power method of per-iteration complexity but for brevity of exposition, the phase adaptation term e−j φ(z)
O(n2 ) to obtain a truncated spectral initialization [36]. More- shall be dropped whenever it is clear from the context.
over, as will be demonstrated by our simulated tests, STAF out-
performs the state-of-the-art algorithms including TAF, ITWF, A. Truncated Amplitude Flow
and (T)WF on both synthetic data and real images in terms In this section, the two stages of TAF are outlined [19].
of both exact recovery performance and convergence speed. In stage one, TAF employs power iterations to compute
Specifically for the real-valued Gaussian model, STAF empiri- an orthogonality-promoting initialization, while the second
cally reconstructs any real-valued n-dimensional signal x from stage refines the initialization via gradient-type iterations.
a number m ≈ 2.3n of magnitude measurements, which is close The orthogonality-promoting initialization builds upon a basic
to the information-theoretic limit of m = 2n − 1. In sharp con- characteristic of high-dimensional spaces, which asserts that
trast, the existing alternatives such as TAF, ITWF, and (T)WF high-dimensional random vectors are almost always nearly or-
typically require a few times more measurements to achieve ex- thogonal to each other [19]. Its core idea relies on approximat-
act recovery. Markedly improved performance is also witnessed ing the unknown x by a vector z 0 ∈ R n most orthogonal to
for STAF when the complex-valued Gaussian model, and coded a carefully selected subset of design vectors {ai }i∈I0 , with
diffraction patterns of real images [17], are employed. the index set I0 ⊆ [m] := {1, 2, . . . , m}. It is well known
Paper outline: The rest of the paper is outlined as follows. that the geometric relationship between any nonzero vectors
Section II first reviews the truncated amplitude flow (TAF) p ∈ R n and q ∈ R n can be captured by their squared normal-
algorithm, and subsequently motivates and derives the two ized inner-product defined as cos2 θ := |p, q|2 /( pi 2 q 2 ),
stages of our proposed STAF algorithm. Section III summarizes where θ ∈ [0, π] signifies the angle between p and q. Intuitively,
STAF, and establishes its theoretical performance. Extensive the smaller cos2 θ is, the more orthogonal the two vectors are.
tests comparing STAF with state-of-the-art approaches on both Assume with no loss of generality that x = 1, which will
synthetic data and real images are presented in Section IV. Fi- be justified shortly. Upon obtaining the squared normalized
nally, main proofs are given in Section V, and technical details inner-products for all pairs {(ai , x)}mi=1 , collectively denoted
can be found in the Appendix. by {cos2 θi }m
i=1 with θi denoting the angle between ai and x,
the orthogonality-promoting initialization constructs I0 by in-
cluding the indices of ai ’s that produce one of the smallest
II. ALGORITHM: STOCHASTIC TRUNCATED AMPLITUDE FLOW
|I0 | normalized inner-products. Precisely, z 0 can be found by
In this section, TAF is first reviewed, and its limitations for solving [19]
large-scale applications are pointed out. To cope with these
limitations, simple, scalable, and fast stochastic gradient descent 1 ai aT
i
minimize z T z (6)
(SGD)-type algorithms for both the initialization and gradient z =1 |I0 | ai 2
i∈I0
refinement stages are developed.
To begin with, a number of basic concepts are introduced. If x where |I0 | is on the order of n. To be precise, as shown in [19,
in the real case solves (1), so does −x. In the complex case, the Theorem 1], one requires for exact recovery of TAF that m ≥
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on September 09,2023 at 15:26:49 UTC from IEEE Xplore. Restrictions apply.
1964 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 65, NO. 8, APRIL 15, 2017
involves a matrix-vector multiplication Y 0 ut per iteration, thus where the index set responsible for the gradient regularization
incurring per-iteration complexity of O(n|I 0 |) or O(n2 ) by is given as [19]
passing through the selected data {ai }i∈I 0 . Furthermore, to
produce an -accurate solution, it incurs runtime of [40] |aTi z t | 1
It+1 := 1 ≤ i ≤ m ≥ , ∀t ≥ 0 (10)
|aTi x| 1+γ
1
O n|I 0 | log(1/) (8)
δ for some regularization parameter γ > 0.
depending on the eigengap δ > 0, which is defined as the gap
between the largest and the second largest eigenvalues of Y 0 B. Variance-reducing Orthogonality-promoting Initialization
normalized by the largest one [40]. It is clear that when the eigen- This section first presents some empirical evidence showing
gap δ is small, the runtime of O(n|I 0 | log(1/)/δ) required by that small eigengaps appear commonly in the orthogonality-
the power method would be equivalent to many passes over the promoting initialization approach. Fig. 1 plots empirical
entire data, and this could be prohibitively for large datasets eigengaps of Y 0 ∈ R n ×n under the real- and complex-valued
[41]. Hence, the power method may not be appropriate for com- Gaussian models over 100 Monte Carlo realizations under
puting the initialization in large-scale applications, particularly default parameters of TAF, where n = 10, 000 is fixed, and m/n
those involving small eigengaps. the number of equations and unknowns increases by 0.2 from
The second stage of TAF relies on truncated gradient itera- 1 to 6. As shown in Fig. 1, the eigengaps of Y 0 resulting from
tions of the amplitude-based cost function (4). Specifically, with the orthogonality-promoting initialization in [19, Algorithm 1]
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on September 09,2023 at 15:26:49 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: SCALABLE SOLVERS OF RANDOM QUADRATIC EQUATIONS VIA STAF 1965
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on September 09,2023 at 15:26:49 UTC from IEEE Xplore. Restrictions apply.
1966 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 65, NO. 8, APRIL 15, 2017
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on September 09,2023 at 15:26:49 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: SCALABLE SOLVERS OF RANDOM QUADRATIC EQUATIONS VIA STAF 1967
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on September 09,2023 at 15:26:49 UTC from IEEE Xplore. Restrictions apply.
1968 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 65, NO. 8, APRIL 15, 2017
Fig. 3. Empirical success rate for: i) WF [1]; ii) TWF [18]; iii) ITAF [36]; Fig. 4. Relative error versus iterations using: i) WF [1]; ii) TWF [18];
iv) TAF [19]; and v) STAF with n = 1, 000 and m/n varying by 0.1 from 1 iii) ITAF [36]; iv) TAF [19]; and v) STAF under the same orthogonality-
to 7 under the same orthogonality-promoting initialization. Top: Noiseless real- promoting initialization. Top: Noiseless real-valued Gaussian model with
valued Gaussian model with x ∼ N (0, I n ), and ai ∼ N (0, I n ); Bottom: x ∼ N (0, I n ), and ai ∼ N (0, I n ); Bottom: Noiseless complex-valued Gaus-
Noiseless complex-valued Gaussian model with x ∼ CN (0, I n ), and ai ∼ sian model with x ∼ CN (0, I n ), and ai ∼ CN (0, I n ), where n = 1, 000,
CN (0, I n ). and m = 5n.
the same solution accuracy, in which one pass through the se- dient evaluations and thus results in considerable savings in
lected data amounts to a number |I 0 | of gradient evaluations computational resources.
of component functions. First, synthetic data based experiments The second experiment evaluates the refinement stage of
are conducted using the real-/complex-valued Gaussian mod- STAF relative to its competing alternatives including those of
els with n = 10, 000 under the known sufficient conditions for (T)WF, TAF, and ITWF in a variety of settings. For fairness, all
uniqueness, i.e., m = 2n − 1 in the real case, and m = 4n − 4 schemes were here initialized using the same orthogonality-
in the complex case. Fig. 2 plots the error evolution of the iterates promoting initialization found using 100 power iterations,
ut for the power method and VR-OPI, where the error in log- and subsequently applied a number of iterations correspond-
arithmic scale is defined as log10 1 − D T ut 2 / D T v 0 2 ing to T = 1, 000 data passes. First, tests on the noiseless
with the exact principal eigenvector v 0 computed from the SVD real- and complex-valued Gaussian models were conducted,
of Y 0 = DD T in (7). Apparently, the inexpensive stochastic with i.i.d. ai ∼ N (0, I 1,000 ), x ∼ N (0, I 1,000 ), and i.i.d. ai ∼
iterations of VR-OPI achieve certain solution accuracy with CN (0, I 1,000 ), x ∼ CN (0, I 1,000 ), respectively. Fig. 3 depicts
considerably fewer gradient evaluations or data passes in both the empirical success rate of all considered schemes with m/n
real and complex settings. This is important for tasks of large varying by 0.1 from 1 to 7. Fig. 4 compares the convergence
|I 0 |, or equivalently large dimension m (since |I 0 | = m/6 speed of various schemes in terms of the number of data passes
by default), because one less data pass implies |I 0 | fewer gra- to produce solutions of a given accuracy. Apparently, starting
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on September 09,2023 at 15:26:49 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: SCALABLE SOLVERS OF RANDOM QUADRATIC EQUATIONS VIA STAF 1969
Fig. 5. Empirical success rate for: i) WF [1]; ii) TWF [18]; iii) ITAF [36]; Fig. 6. Relative error versus iterations using: i) WF [1]; ii) TWF [18];
iv) TAF [19]; and v) STAF with n = 1, 000 and m/n varying 0.1 from 1 iii) ITAF [36]; iv) TAF [19]; and v) STAF with n = 1, 000 and m/n = 5. Top:
to 7. Top: Noiseless real-valued Gaussian model with x ∼ N (0, I n ), and Noisy real-valued Gaussian model with x ∼ N (0, I n ), and ai ∼ N (0, I n );
ai ∼ N (0, I n ); Bottom: Noiseless complex-valued Gaussian model with Bottom: Noisy complex-valued Gaussian model with x ∼ CN (0, I n ), and
x ∼ CN (0, I n ), and ai ∼ CN (0, I n ). ai ∼ CN (0, I n ).
with the same initialization, STAF outperforms its competing STAF guarantees exact recovery from about 2.3n magnitude-
alternatives under both real-/complex-valued Gaussian models. only measurements, which is close to the information-theoretic
In particular, SGD-based STAF improves in terms of exact limit of m = 2n − 1. In comparison, existing alternatives re-
recovery and convergence speed over the state-of-the-art quire a few times more measurements to achieve exact recovery.
gradient-type TAF, corroborating the benefit of using SGD-type STAF also performs well in the complex case.
solvers to cope with saddle points and local minima of noncon- To demonstrate the robustness of STAF against additive noise,
vex optimization [25], [36]. we perform stable phase retrieval under the noisy real-/complex-
The previous experiment showed improved performance of valued Gaussian model ψi = |aH i x| + ηi , with ηi ∼ N (0, σ I)
2
STAF under the same initialization. Now, we present numer- i.i.d., and σ = 0.1 x . The noisy data for magnitude-square
2 2 2
ical results comparing different schemes equipped with their based algorithms were generated as yi = ψi2 . Curves in Fig. 6
own initialization, namely, WF with spectral initialization [1], clearly show near-perfect statistical performance and fast con-
(I)TWF with truncated spectral initialization [18], as well as vergence of STAF.
TAF with orthogonality-promoting initialization using power Finally, to demonstrate the effectiveness and scalability of
iterations [19], and STAF with VR-OPI. Fig. 5 demonstrates STAF on real data, the Milky Way Galaxy image1 is con-
merits of STAF over its competing alternatives in exact recovery sidered. The colorful image of RGB bands is denoted by
performance on the noiseless real-valued (left) and complex-
valued (right) Gaussian model. Specifically in the real case, 1 Downloaded from http://pics-about-space.com/milky-way-galaxy.
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on September 09,2023 at 15:26:49 UTC from IEEE Xplore. Restrictions apply.
1970 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 65, NO. 8, APRIL 15, 2017
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on September 09,2023 at 15:26:49 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: SCALABLE SOLVERS OF RANDOM QUADRATIC EQUATIONS VIA STAF 1971
relation for the previous iteration. Continuing this process to index it (rather than the data randomness) to obtain
reach the initialization z 0 and appealing to the initialization
result in (22) collectively, leads to (20), hence completes the Ei t dist2 (z t+1 , x)
proof of Theorem 1. m
2μ T aTi t z t
= ht − 2
ai t z t − ψi t T aT ht 1 T ψ
m i t =1
|ai t z t | i t |ai t zt |≥ 1 +i tγ
Proof of Proposition 2
m
2
μ2 aT z t
To prove Proposition 2, let us first define the truncated gradi- + aTi t z t − ψi t iTt ai t 2
1 ψ
.
ent of (z) as follows m i =1 |ai t z t | |aTi t zt |≥ 1 +i tγ
t
(27)
m
aT z
∇tr (z) = aTi z − ψi iT ai 1 (24) Now the task reduces to upper bounding the terms on the right
|ai z| |aTi zt |≥ 1 +1 γ ψ i hand side of (27). Note from (24) that by means of ∇tr (z t ),
i=1
the second term in (27) can be re-expressed as follows
which corresponds to the truncated gradient employed by TAF m
2μ aT
z t
[19]. Instrumental in proving the local error contraction in − aTi t z t − ψi t iTt aTi t ht 1 T ψ
Proposition 2, the following lemma adopts a sufficient decrease m i =1 |ai t z t | |ai t zt |≥ 1 +i tγ
t
result from [19, Proposition 3]. The sufficient decrease is a key
2μ
step in establishing the local regularity condition [1], [18], [19], =− ∇tr (z t ), ht
which suffices to prove linear convergence of iterative optimiza- m
2
tion algorithms. ≤ −4μ (1 − ζ1 − ζ2 − 2) h (28)
Proposition 3: [19, Proposition 3] Consider the noise-free
measurements ψi = |aTi x| with i.i.d. ai ∼ N (0, I n ), 1 ≤ i ≤ where the inequality follows from Proposition 3. Regarding
m, and γ = 0.7. For any fixed > 0, there exist universal con- the last term in (27), since for the i.i.d. real-valued Gaussian
stants c0 , c1 , c2 > 0 such that if m > c0 n, then the following ai ’s, maxi t ∈[m ] ai t ≤ 2.3n holds with probability at least
holds with probability at least 1 − c2 exp(−c1 m), 1 − me−n /2 [19], and also 1 T ψi
t
≤ 1, then the next
{|ai z t |≥ 1 + γ }
t
m
2.3nμ2 T 2
n
for all x, z ∈ R such that h / x ≤ 1/10, where estimates ≤ ai t z t − aTi t x
m i =1
ζ1 ≈ 0.0782, and ζ2 ≈ 0.3894. t
Now let us turn to the term on the left hand side of (23), m
2.3nμ2 T 2
which after plugging in the update of z t+1 in (17) or (18), boils ≤ a z t − aTi t x
m i =1 i t
down to t
2
2.3nμ T T
dist2 (z t+1 , x) ≤ ht A Aht
m
a T
z 2 ≤ 2.3(1 + δ)μ2 n ht 2
(29)
t
= ht − μt aTi t z t − ψi t Tt i
ai t 1 T ψi
|ai t z t | |ai t zt |≥ 1 + γt
in which the second inequality comes from (|aTi t z t | −
aTi t z t |aTi t x|)2 ≤ (aTi t z t − aTi t x)2 , and the last inequality arises due
2 T
= ht − 2μt ai t z t − ψi t T aTi t ht 1 T ψ
to the fact that λm ax (AT A) ≤ (1 + δ)m holds with probabil-
|ai t z t | |ai t zt |≥ 1 +i tγ
ity at least 1 − c2 exp(−c1 nδ 2 ), provided that m ≥ c0 nδ −2 for
2 some universal constant c0 , c1 , c2 > 0 [49, Theorem 5.39].
aTi t z t
T
+ μt ai t z t − ψi t T
2
ai t 2 1 T ψ
Substituting (28) and (29) into (27) establishes that
|ai t z t | |ai t zt |≥ 1 +i tγ
(26) Ei t dist2 (z t+1 , x) ≤ 1 − 4μ (1 − ζ1 − ζ2 − 2)
+ 2.3(1 + δ)μ2 n ht 2 (30)
where μt = μ > 0 with it ∈ {1, 2, . . . , m} sampled uniformly
at random in (17), or μt = 1/ ai t 2 with it ∈ {1, 2, . . . , m} holds with probability exceeding 1 − c2 m exp(−c1 n) provided
selected with probability proportional to ai t 2 in (18). that m ≥ c0 n, where c0 ≥ c0 δ −2 . To obtain legitimate estimates
Consider first the constant step size case in (17). Take the for the step size, fixing , δ > 0 to be sufficiently small con-
expectation of both sides in (26) with respect to the selection of stants, say e.g., 0.01, then using (30), μ can be chosen such that
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on September 09,2023 at 15:26:49 UTC from IEEE Xplore. Restrictions apply.
1972 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 65, NO. 8, APRIL 15, 2017
4(0.98 − ζ1 − ζ2 ) − 2.42μn > 0, yielding term on the right hand side of (33), one obtains that
m
2
ai t 2 1 aTi t z t
4(0.98 − ζ1 − ζ2 ) μ0
T
ai t z t − ψ i t T 1 T ψi
0.8469 A 2 a 2 |a z | |ai z t |≥ 1 + tγ
0<μ< ≈ := . (31) i =1
t
F i t i t t t
2.42n n n
m
1 2
= aTi t z t − aTi t x 1 ψ
Plugging μ = c3 /n for some 0 < c3 ≤ μ0 into (30), gives A 2
F i t =1
|aTi t zt |≥ 1 +i tγ
rise to m
1 T 2
≤ ai t z t − aTi t x
A
ν
2
F i t =1
Ei t dist2 (z t+1 , x) ≤ 1 − dist2 (z t , x) (32)
n 1
≤ hTt AT Aht
A 2
F
for ν := 4c3 (1− ζ1 − ζ2 −2) − 2.3c23 (1+δ) ≤ ν0 := 0.0697, (1 + δ)m 2
where the equality holds at the maximum step size μ = μ0 , ≤ ht
(1 − σ)mn
hence concluding the proof of Proposition 2 for the constant
step size case. (1 + δ) 2
≤ ht (35)
Now let us turn to the case of a time-varying step size. Specif- (1 − σ)n
ically, let μt = 1/ ai t 2 , and it be sampled at
random from the which holds with high probability as soon as m ≥ c0 n ≥
set {1, 2, . . . , m} with probability ai t 2 / m i t =1 ai t
2
= c0 δ −2 n.
ai t / A F [50]. Taking the expectation of both sides in (26)
2 2
Putting results in (33), (34), and (35) together, one establishes
over it gives rise to that the following holds
4
Ei t dist (z t+1 , x) ≤ 1 −
2
(1 − ζ1 − ζ2 − 2)
Ei t dist2 (z t+1 , x) (1 + σ)n
m
aTi t ht aTi t z t (1 + δ)
+ ht 2 (36)
= ht 2
−2 T
ai t z t − ψi t T 1 T ψi
(1 − σ)n
A 2
F |a i z t | |ai z t |≥ 1 + tγ
t
i =1
t t
m
2 a T
z t
REFERENCES
≤− aT z t −ψi t Tti
aT ht 1 T ψi
(1 + σ)mni =1 i t |ai t z t | i t |ai z t |≥ 1 + tγ
t
[1] E. J. Candès, X. Li, and M. Soltanolkotabi, “Phase retrieval via Wirtinger
t flow: Theory and algorithms,” IEEE Trans. Inf. Theory, vol. 61, no. 4,
4m pp. 1985–2007, Apr. 2015.
2
≤− (1 − ζ1 − ζ2 − 2) h [2] J. Miao, P. Charalambous, J. Kirz, and D. Sayre, “Extending the method-
(1 + σ)mn ology of X-ray crystallography to allow imaging of micrometre-sized
non-crystalline specimens,” Nature, vol. 400, no. 6742, pp. 342–344, Jul.
4 2
≤− (1 − ζ1 − ζ2 − 2) h (34) 1999.
(1 + σ)n [3] R. P. Millane, “Phase retrieval in crystallography and optics,” J. Opt. Soc.
Am. A, vol. 7, no. 3, pp. 394–411, 1990.
[4] O. Bunk et al., “Diffractive imaging for periodic samples: Retrieving one-
dimensional concentration profiles across microfluidic channels,” Acta
where the second inequality follows from Proposition 3, and the Crystallograph. A, Found. Crystallograph., vol. 63, no. 4, pp. 306–314,
last inequality from the fact that m ≥ c0 n. Concerning the last 2007.
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on September 09,2023 at 15:26:49 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: SCALABLE SOLVERS OF RANDOM QUADRATIC EQUATIONS VIA STAF 1973
[5] K. Jaganathan, Y. C. Eldar, and B. Hassibi, “Phase retrieval: An overview [32] T. Qiu, P. Babu, and D. P. Palomar, “PRIME: Phase retrieval via
of recent developments,” arXiv:1510.07713, 2015. majorization-minimization,” IEEE Trans. Signal Process., vol. 64, no. 19,
[6] E. J. Candès, Y. C. Eldar, T. Strohmer, and V. Voroninski, “Phase retrieval pp. 5174–5186, Oct. 2016.
via matrix completion,” SIAM Rev., vol. 57, no. 2, pp. 225–251, May 2015. [33] T. Qiu and D. Palomar, “Undersampled phase retrieval via majorization-
[7] E. Hofstetter, “Construction of time-limited functions with specified auto- minimization,” arXiv:1609.02842, 2016.
correlation functions,” IEEE Trans. Inf. Theory, vol. 10, no. 2, pp. 119–126, [34] H. Zhang, Y. Chi, and Y. Liang, “Provable non-convex phase retrieval with
Apr. 1964. outliers: Median truncated Wirtinger flow,” arXiv:1603.03805, 2016.
[8] Y. Shechtman, A. Beck, and Y. C. Eldar, “GESPAR: Efficient phase re- [35] K. Wei, “Solving systems of phaseless equations via Kaczmarz methods:
trieval of sparse signals,” IEEE Trans. Signal Process., vol. 62, no. 4, A proof of concept study,” Inverse Probl., vol. 31, no. 12, p. 125008, Nov.
pp. 928–938, Feb. 2014. 2015.
[9] J. R. Fienup, “Phase retrieval algorithms: A comparison,” Appl. Opt., [36] R. Kolte and S. A. Ozgur, “Phase retrieval via incremental truncated
vol. 21, no. 15, pp. 2758–2769, Aug. 1982. Wirtinger flow,” arXiv:1606.03196, 2016.
[10] J. Ranieri, A. Chebira, Y. M. Lu, and M. Vetterli, “Phase retrieval for [37] E. J. Candès, T. Strohmer, and V. Voroninski, “PhaseLift: Exact and stable
sparse signals: Uniqueness conditions,” arXiv:1308.3058, 2013. signal recovery from magnitude measurements via convex programming,”
[11] K. Jaganathan, S. Oymak, and B. Hassibi, “Recovery of sparse 1-D signals Appl. Comput. Harmon. Anal., vol. 66, no. 8, pp. 1241–1274, Nov. 2013.
from the magnitudes of their Fourier transform,” in Proc. IEEE Int. Symp. [38] I. Waldspurger, A. d’Aspremont, and S. Mallat, “Phase recovery, max-
Inf. Theory, 2012, pp. 1473–1477. cut and complex semidefinite programming,” Math. Program., vol. 149,
[12] P. Netrapalli, P. Jain, and S. Sanghavi, “Phase retrieval using alternating no. 1–2, pp. 47–81, 2015.
minimization,” IEEE Trans. Signal Process., vol. 63, no. 18, pp. 4814– [39] K. Huang, Y. C. Eldar, and N. D. Sidiropoulos, “Phase retrieval from
4826, Sep. 2015. 1D Fourier measurements: Convexity, uniqueness, and algorithms,”
[13] C. Qian, N. D. Sidiropoulos, K. Huang, L. Huang, and H. C. So, “Phase arXiv:1603.05215, 2016.
retrieval using feasible point pursuit: Algorithms and Cramer-Rao bound,” [40] G. H. Golub and C. F. Van Loan, Matrix Computations. Baltimore, MD,
IEEE Trans. Signal Process., vol. 64, no. 20, pp. 5282–5296, Oct. 2016. USA: Johns Hopkins Univ. Press, 2012, vol. 3.
[14] G. Wang, G. B. Giannakis, J. Chen, and M. Akçakaya, “SPARTA: Sparse [41] O. Shamir, “Fast stochastic algorithms for SVD and PCA: Convergence
phase retrieval via truncated amplitude flow,” in Proc. IEEE Int. Conf. properties and convexity,” in Proc. 33th Proc. Int. Conf. Mach. Learn.,
Acoust., Speech Signal Process., New Orleans, LA, USA, 2017. New York, NY, USA, 2016.
[15] G. Wang, L. Zhang, G. B. Giannakis, J. Chen, and M. Akçakaya, “Sparse [42] E. Oja, “Simplified neuron model as a principal component analyzer,” J.
phase retrieval via truncated amplitude flow,” arXiv:1611.07641, 2016. Math. Biol., vol. 15, no. 3, pp. 267–273, Nov. 1982.
[16] T. Bendory and Y. C. Eldar, “Non-convex phase retrieval from STFT [43] R. Johnson and T. Zhang, “Accelerating stochastic gradient descent using
measurements,” arXiv:1607.08218, 2016. predictive variance reduction,” in Proc. Adv. Neural Inf. Process. Syst.,
[17] E. J. Candès, X. Li, and M. Soltanolkotabi, “Phase retrieval from coded 2013, pp. 315–323.
diffraction patterns,” Appl. Comput. Harmon. Anal., vol. 39, no. 2, pp. 277– [44] S. Kaczmarz, “Angenherte auflsung von systemen linearer gleichungen,”
299, Sep. 2015. Bull. Int. de l’Acadmie Polonaise des Sci. et des Lett. Classe des Sci.
[18] Y. Chen and E. J. Candès, “Solving random quadratic systems of equations Mathmatiques et Naturelles. Srie A, Sci. Mathmatiques, vol. 37, pp. 355–
is nearly as easy as solving linear systems,” Commun. Pure Appl. Math., 357, 1937.
to be published. [45] P. M. Pardalos and S. A. Vavasis, “Quadratic programming with one
[19] G. Wang, G. B. Giannakis, and Y. C. Eldar, “Solving systems of random negative eigenvalue is NP-hard,” J. Global Optim., vol. 1, no. 1, pp. 15–
quadratic equations via truncated amplitude flow,” arXiv:1605.08285, 22, 1991.
2016. [46] D. K. Berberidis, V. Kekatos, G. Wang, and G. B. Giannakis, “Adaptive
[20] R. Balan, P. Casazza, and D. Edidin, “On signal reconstruction without censoring for large-scale regressions,” in Proc. IEEE Int. Conf. Acoust.,
phase,” Appl. Comput. Harmon. Anal., vol. 20, no. 3, pp. 345–356, May Speech Signal Process., South Brisbane, Qld, Australia, 2015, pp. 5475–
2006. 5479.
[21] A. Conca, D. Edidin, M. Hering, and C. Vinzant, “An algebraic charac- [47] G. Wang, D. Berberidis, V. Kekatos, and G. B. Giannakis, “Online recon-
terization of injectivity in phase retrieval,” Appl. Comput. Harmon. Anal., struction from big data via compressive censoring,” in Proc. IEEE Global
vol. 38, no. 2, pp. 346–356, Mar. 2015. Conf. Signal Inf. Process., Atlanta, GA, USA, 2014, pp. 326–330.
[22] A. Ben-Tal and A. Nemirovski, Lectures on Modern Convex Optimization: [48] J. Chen, G. Wang, and J. Sun, “Power scheduling for Kalman filtering
Analysis, Algorithms, and Engineering Applications. Philadelphia, PA, over lossy wireless sensor networks,” IET Control Theory Appl., to be
USA: SIAM, 2001, vol. 2. published.
[23] Y. Chen, X. Yi, and C. Caramanis, “A convex formulation for mixed [49] R. Vershynin, “Introduction to the non-asymptotic analysis of random
regression with two components: Minimax optimal rates,” in Proc. 27th matrices,” arXiv:1011.3027, 2010.
Conf. Learn. Theory, Paris, France, Jun. 2014, pp. 560–604. [50] T. Strohmer and R. Vershynin, “A randomized Kaczmarz algorithm with
[24] K. G. Murty and S. N. Kabadi, “Some NP-complete problems in quadratic exponential convergence,” J. Fourier Anal. Appl., vol. 15, no. 2, pp. 262–
and nonlinear programming,” Math. Program., vol. 39, no. 2, pp. 117–129, 278, 2009.
1987.
[25] R. Ge, F. Huang, C. Jin, and Y. Yuan, “Escaping from saddle points—
Online stochastic gradient for tensor decomposition,” in Proc. 28th Conf.
Learn. Theory, 2015, pp. 797–842.
[26] Y. N. Dauphin, R. Pascanu, C. Gulcehre, K. Cho, S. Ganguli, and
Y. Bengio, “Identifying and attacking the saddle point problem in high-
dimensional non-convex optimization,” in Proc. Adv. Neural Inf. Process.
Syst., 2014, pp. 2933–2941.
[27] R. W. Gerchberg and W. O. Saxton, “A practical algorithm for the determi-
nation of phase from image and diffraction,” Optik, vol. 35, pp. 237–246,
Nov. 1972.
Gang Wang (S’12) received the B.Eng. degree in
[28] I. Waldspurger, “Phase retrieval with random Gaussian sensing vectors by
electrical engineering and automation from Beijing
alternating projections,” aXiv:1609.03088, 2016.
[29] C. Qian, X. Fu, N. D. Sidiropoulos, L. Huang, and J. Xie, “Inexact al- Institute of Technology, Beijing, China, in 2011. He
is currently working toward the Ph.D. degree in the
ternating optimization for phase retrieval in the presence of outliers,”
Department of the Electrical and Computer Engi-
arXiv:1605.00973v1, 2016.
neering, University of Minnesota, Minneapolis, MN,
[30] G. Wang and G. B. Giannakis, “Solving random systems of quadratic
equations via truncated generalized gradient flow,” in Proc. Adv. Neural USA. His research interests focuses on the areas of
high-dimensional structured signal processing and
Inf. Process. Syst., Barcelona, Spain, 2016, pp. 568–576.
(non)convex optimization for smart power grids.
[31] J. Sun, Q. Qu, and J. Wright, “A geometric analysis of phase retrieval,”
arXiv:1602.06664, 2016.
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on September 09,2023 at 15:26:49 UTC from IEEE Xplore. Restrictions apply.
1974 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 65, NO. 8, APRIL 15, 2017
Georgios B. Giannakis (F’97) received the Diploma Jie Chen (SM’12) received the B.S., M.S., and Ph.D.
in electrical engineering from the National Techni- degrees from the Beijing Institute of Technology, Bei-
cal University of Athens, Zografou, Greece, 1981. jing, China, in 1986, 1996, and 2001, respectively. He
From 1982 to 1986, he was with the University of is currently a Professor in the School of Automation,
Southern California, Los Angeles, CA, USA. where Beijing Institute of Technology. His current research
he received the M.Sc. degree in electrical engineer- interests include complex systems, multi-agent sys-
ing, the M.Sc. degree in mathematics, and the Ph.D. tems, multi-objective optimization and decision, con-
degree in electrical engineering, in 1983, 1986, and strained nonlinear control, and optimization methods.
1986, respectively. He was with the University of Vir-
ginia from 1987 to 1998, and since 1999, he has been
a Professor with the University of Minnesota, Min-
neapolis, MN, USA, where he holds an Endowed Chair in Wireless Telecom-
munications, a University of Minnesota McKnight Presidential Chair in ECE,
and is the Director of the Digital Technology Center.
His general interests span the areas of communications, networking and
statistical signal processing subjects on which he has published more than 400
journal papers, 680 conference papers, 25 book chapters, two edited books
and two research monographs (h-index 119). His current research focuses on
learning from Big Data, wireless cognitive radios, and network science with
applications to social, brain, and power networks with renewables. He is the
(co-)inventor of 28 patents issued, and the (co-)recipient of eight best paper
awards from the IEEE Signal Processing (SP) and Communications Societies,
including the G. Marconi Prize Paper Award in Wireless Communications. He
also received Technical Achievement Awards from the SP Society (2000), from
EURASIP (2005), a Young Faculty Teaching Award, the G. W. Taylor Award
for Distinguished Research from the University of Minnesota, and the IEEE
Fourier Technical Field Award (2015). He is a Fellow of EURASIP, and has
served the IEEE in a number of posts, including that of a Distinguished Lecturer
for the IEEE-SP Society.
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on September 09,2023 at 15:26:49 UTC from IEEE Xplore. Restrictions apply.