2013-Compressed Sensing and Matrix Completion With Constant Proportion of Corruptions
2013-Compressed Sensing and Matrix Completion With Constant Proportion of Corruptions
DOI 10.1007/s00365-012-9176-9
Xiaodong Li
Abstract In this paper, we improve existing results in the field of compressed sens-
ing and matrix completion when sampled data may be grossly corrupted. We intro-
duce three new theorems. (1) In compressed sensing, we show that if the m × n
sensing matrix has independent Gaussian entries, then one can recover a sparse sig-
nal x exactly by tractable 1 minimization even if a positive fraction of the mea-
surements are arbitrarily corrupted, provided the number of nonzero entries in x is
O(m/(log(n/m) + 1)). (2) In the very general sensing model introduced in Candès
and Plan (IEEE Trans. Inf. Theory 57(11):7235–7254, 2011) and assuming a positive
fraction of corrupted measurements, exact recovery still holds if the signal now has
O(m/(log2 n)) nonzero entries. (3) Finally, we prove that one can recover an n × n
low-rank matrix from m corrupted sampled entries by tractable optimization provided
the rank is on the order of O(m/(n log2 n)); again, this holds when there is a positive
fraction of corrupted samples.
1 Introduction
Compressed sensing (CS) has been well studied in recent years [10, 19]. This novel
theory asserts that a sparse or approximately sparse signal x ∈ Rn can be acquired
by taking just a few nonadaptive linear measurements. This fact has numerous con-
sequences which are being explored in a number of fields of applied science and
engineering. In CS, the acquisition procedure is often represented as y = Ax, where
A ∈ Rm×n is called the sensing matrix and y ∈ Rm is the vector of measurements
or observations. It is now well established that the solution x̂ to the optimization
problem
min x̃1 such that Ax̃ = y (1.1)
x̃
is guaranteed to be the original signal x with high probability, provided x is suffi-
ciently sparse and A obeys certain conditions. A typical result is this: If A has iid
Gaussian entries, then exact recovery occurs provided x0 ≤ Cm/(log(n/m) + 1)
[11, 18, 35] for some positive numerical constant C > 0. Here is another example: If
A is a matrix with rows randomly selected from the Discrete Fourier Transformation
(DFT) matrix, the condition becomes x0 ≤ Cm/ log n [10].
This paper discusses a natural generalization of CS, which we shall refer to as
compressed sensing with corruptions. We assume that some entries of the data vector
y are totally corrupted but we have absolutely no idea which entries are unreliable.
We still want to recover the original signal efficiently and accurately. Formally, we
have the mathematical model
x
y = Ax + f = [A, I ] , (1.2)
f
where x ∈ Rn and f ∈ Rm . The number of nonzero coefficients in x is x0 and sim-
ilarly for f . As in the above model, A is an m × n sensing matrix, usually sampled
from a probability distribution. The problem of recovering x (and hence f ) from y
has been recently studied in the literature in connection with some interesting appli-
cations. We discuss a few of them.
• Clipping. Signal clipping frequently appears because of nonlinearities in the acqui-
sition device [27, 36]. Here, one typically measures g(Ax) rather than Ax, where g
is always a nonlinear map. Letting f = g(Ax) − Ax, we thus observe y = Ax + f .
Nonlinearities usually occur at large amplitudes so that for those components with
small amplitudes, we have f = g(Ax) − Ax = 0. This means that f is sparse and,
therefore, our model is appropriate. Just as before, locating the portion of the data
vector that has been clipped may be difficult because of additional noise.
• CS for networked data. In a sensor network, different sensors will collect measure-
ments of the same signal x independently (they each measure zi = ai , x) and
send the outcome to a center hub for analysis [23, 29]. By setting ai as the row
vectors of A, this is just z = Ax. However, typically some sensors will fail to send
the measurements correctly, and will sometimes report totally meaningless mea-
surements. Therefore, we collect y = Ax + f , where f models recording errors.
There have been several theoretical papers investigating the exact recovery method
for CS with corruptions [26, 28, 29, 36, 40], and all of them consider the following
recovery procedure in the noiseless case:
˜ ˜ x̃
min x̃1 + λ(m, n)f 1 such that Ax̃ + f = [A, I ] ˜ = y. (1.3)
x̃,f˜ f
Matrix completion (MC) bears some similarity to CS. Here, the goal is to recover a
low-rank matrix L ∈ Rn×n from a small fraction of linear measurements. For simplic-
ity, we suppose the matrix is square as above (the general case is similar). The stan-
dard model is that we observe PO (L), where O ⊂ [n] × [n] := {1, . . . , n} × {1, . . . , n}
and
Lij if (i, j ) ∈ O;
PO (L)ij =
0 otherwise.
The problem is to recover the original matrix L, and there have been many papers
studying this problem in recent years, see [7, 9, 21, 25, 32], for example. Here one
minimizes the nuclear norm—the sum of all the singular values [20]—to recover the
original low-rank matrix. We discuss below an improved result due to Gross [21]
(with a slight difference).
Define O ∼ Ber(ρ) for some 0 < ρ < 1 to mean that 1{(i,j )∈O} are iid Bernoulli
random variables with parameter ρ. Then the solution to
∗
min L = PO (L)
such that PO (L) (1.4)
L
C rμ log2 n
is guaranteed to be exactly L with high probability, provided ρ ≥ ρ n . Here,
Cρ is a positive numerical constant, r is the rank of L, and μ is an incoherence
parameter introduced in [7] which is only dependent on L.
This paper is concerned with the situation in which some entries may have been
corrupted. Therefore, our model is that we observe
PO (L) + S, (1.5)
where O and L are the same as before and S ∈ Rn×n is supported on Ω ⊂ O. Just as
in CS, this model has broad applicability. For example, Wu et al. used this model in
photometric stereo [42]. This problem has also been introduced in [12] and is related
to recent work in separating a low-rank from a sparse component [12–14, 24, 43].
A typical result is that the solution (L, S) to
∗ + λ(m, n)
min L S1 +
such that PO (L) S = PO (L) + S (1.6)
L,S
is guaranteed to be the true pair (L, S) with high probability under some assumptions
about L, O, S [12, 16]. We will compare them with our result in Sect. 1.4.
This section introduces three models and three corresponding recovery results. The
proofs of these results are deferred to Sect. 2 for Theorem 1.1, Sect. 3 for Theo-
rem 1.2, and Sect. 4 for Theorem 1.3.
76 Constr Approx (2013) 37:73–99
Theorem
√ 1.2 For the model above, the solution (x̂, fˆ) to (1.3), with λ(n, m) =
1/ log n, is exact with probability at least 1 − Cn−3 , provided that s ≤ α m 2
μ log n
and mb ≤ β mμ . Here C, α, and β are some numerical constants.
Above, x and f have fixed supports and random signs. However, by a recent deran-
domization technique first introduced in [12], exact recovery with random supports
and fixed signs would also hold. We will explain this derandomization technique in
the proof of Theorem 1.3. In some specific models, such as independent rows from
Constr Approx (2013) 37:73–99 77
the DFT matrix, μ could be a numerical constant, which implies the proportion of
corruptions is also a constant. An open problem is whether Theorem 1.2 still holds in
the case where x and f have both fixed supports and signs. Another open problem is
to know whether the result would hold under more general conditions about A as in
[5] in the case where x has both random support and random signs.
We emphasize that the sparsity condition x0 ≤ C m 2 is a little stronger
μ log n
than the optimal result available in the noise-free literature [6, 10]), namely,
x0 ≤ C μ logm
n . The extra logarithmic factor appears to be important in the proof
that we explain in Sect. 3, and a third open problem is whether or not it is possible to
remove this factor.
Here we do not give a sensitivity analysis for the recovery procedure as in Model 1.
Actually, by applying a similar method introduced in [6] to our argument in Sect. 3, a
very good error bound could be obtained in the noisy case. However, technically there
is little novelty, but it would make our paper very long. Therefore, we decided to only
discuss the noiseless case and focus on the analysis of sampling rate and corruption
ratio.
We assume L is of rank r and write its reduced Singular value decomposition (SVD)
as L = U ΣV ∗ , where U, V ∈ Rn×r and Σ ∈ Rr×r . Let μ be the smallest quantity
such that for all 1 ≤ i ≤ n,
√
U U ∗ ei 2 ≤ μr , V V ∗ ei 2 ≤ μr , and U V ∗ ≤ μr .
2 n 2 n ∞ n
This model is the same as that originally introduced in [7] and later used in [9, 12, 16,
21, 31]. We observe PO (L) + S, where O ∈ [n] × [n] and S is supported on Ω ⊂ O.
Here we assume that O, Ω, S satisfy the following model:
Model 3.1
1. Fix an n by n matrix K, whose entries are either 1 or −1.
2. Define O ∼ Ber(ρ) for a constant ρ satisfying 0 < ρ < 12 . Specifically speaking,
1{(i,j )∈O} are iid Bernoulli random variables with parameter ρ.
3. Conditioning on (i, j ) ∈ O, assume that (i, j ) ∈ Ω are independent events with
P((i, j ) ∈ Ω|(i, j ) ∈ O) = s. This implies that Ω ∼ Ber(ρs).
4. Define Γ := O/Ω. Then we have Γ ∼ Ber(ρ(1 − s)).
5. Let S be supported on Ω, and sgn(S) := PΩ (K).
2
Theorem 1.3 Under Model 3.1, suppose ρ > Cρ μr log n
n
and s ≤ Cs . Moreover, sup-
pose λ := √ρn log n , and denote (L̂, Ŝ) as the optimal solution to the problem (1.6).
1
Then we have (L̂, Ŝ) = (L, S) with probability at least 1 − Cn−3 for some numeri-
cal constant C, provided the numerical constants Cs is sufficiently small and Cρ is
sufficiently large.
78 Constr Approx (2013) 37:73–99
In this model, O is available while Ω, Γ and S are not known explicitly from
the observation PO (L) + S. By the assumption O ∼ Ber(ρ), we can use |O|/(n2 )
to approximate ρ. From the following proof we can see that λ is not required to be
√ 1 exactly for the exact recovery. The power of our result is that one can recover
ρn log n
a low-rank matrix from a nearly minimal number of samples even when a constant
proportion of these samples has been corrupted.
We only discuss the noiseless case for this model. Actually by a method similar
to [5], a suboptimal estimation error bound can be obtained by a slight modification
of our argument. However, it is of little interest technically and beyond the optimal
result when n is large. There are other suboptimal results for matrix completion with
noise, such as [1], but the error bound is not tight when the additional noise is small.
We want to focus on the noiseless case in this paper and leave the problem with noise
for future work.
The values of λ are chosen for theoretical guarantee of exact recovery in Theo-
rems 1.1, 1.2, and 1.3. In practice, λ is usually taken by cross validation.
1.4 Comparison with Existing Results, Relative Works, and Our Contribution
In this section we will compare Theorems 1.1, 1.2, and 1.3 with existing results in
the literature.
We begin with Model 1. In [40], Wright and Ma discussed a model where the
sensing matrix A has independent columns with common mean μ and normal pertur-
bations with variance σ 2 /m. They chose λ(m, n) = 1 and proved that (x̂, fˆ) = (x, f )
with high probability, provided x0 ≤ C1 (σ, n/m)m, f 0 ≤ C2 (σ, n/m)m, and f
has random signs. Here C1 (σ, 1/m) is much smaller than C/(log(n/m) + 1). We no-
tice that since the authors of [40] talked about a different model, which is motivated
by [41], it may not be comparable with ours directly. However, for our motivation of
CS with corruptions, we assume A satisfy a symmetric distribution and get a better
sampling rate.
A bit later, Laska et al. [26] and Li et al. [28] also studied this problem. By setting
λ(m, n) = 1, both papers establish that for Gaussian (or sub-Gaussian) sensing matri-
ces A, if m > C(x0 + f 0 ) log((n + m)/(x0 + f 0 )), then the recovery is ex-
act. This follows from the fact that [A, I ] obeys a restricted isometry property known
to guarantee exact recovery of sparse vectors via 1 minimization. Furthermore, the
sparsity requirement about x is the same as that found in the standard CS literature,
namely, x0 ≤ Cm/(log(n/m) + 1). However, √ the result does not allow a positive
fraction of corruptions. For example, if m = n, we have f 0 /m ≤ 2/ log n, which
will go to zero as n goes to zero.
As for Model 2, an interesting piece of work [29] (and later [30] on the noisy
case) appeared during the preparation of this paper. These papers discuss models in
which A is formed by selecting rows from an orthogonal matrix with low incoherence
parameter μ, which is the minimum value such that n|Aij |2 ≤ μ for any i, j . The
main result states that selecting λ = n/(Cμm log n) gives exact recovery under
the following assumptions: (1) the rows of A are chosen from an orthogonal matrix
uniformly at random; (2) x is a random signal with independent signs and equally
likely to be either ±1; (3) the support of f is chosen uniformly at random. (By the
Constr Approx (2013) 37:73–99 79
derandomization technique introduced in [12] and used in [29], it would have been
sufficient to assume that the signs of f are independent and take on the values ±1
with equal probability). Finally, the sparsity conditions require m ≥ Cμ2 x0 (log n)2
and f 0 ≤ Cm, which are nearly optimal, for the best known sparsity condition
when f = 0 is m ≥ Cμx0 log n. In other words, the result is optimal up to an extra
factor of μ log n; the sparsity condition about f is of course nearly optimal.
However, the model for A does not include some models frequently discussed
in the literature such as subsampled tight or continuous frames. Against this back-
ground, a recent paper of Candès and Plan [6] considers a very general framework,
which includes a lot of common models in the literature. Theorem 1.2 in our paper
is similar to Theorem 1 in [29]. It assumes similar sparsity conditions, but is based
on this much broader and more applicable model introduced in [6]. Notice that we
require m ≥ Cμx0 (log n)2 , whereas [29] requires m ≥ Cμ2 x0 (log n)2 . There-
fore, we improve the condition by a factor of μ, which is always at least 1 and can
be as large as n. However, our result imposes f 0 ≤ Cm/μ, which is worse than
f 0 ≤ γ m by the same factor. In [29], the parameter λ depends upon μ, while our
λ is only a function of m and n. This is why the results differ, and we prefer to use
a value of λ that does not depend on μ because in some applications, an accurate
estimate of μ may be difficult to obtain. In addition, we use different techniques of
proof in which the clever golfing scheme of [21] is exploited.
Sparse approximation is another problem of an underdetermined linear system
where the dictionary matrix A is always assumed to be deterministic. Readers inter-
ested in this problem (which always requires stronger sparsity conditions) may also
want to study the recent paper [36] by Studer et al. There, the authors introduce a
more general problem of the form y = Ax + Bf and analyze the performance of
1 -recovery techniques by using ideas which have been popularized under the name
of generalized uncertainty principles in the basis pursuit and sparse approximation
literature.
As for Model 3, Theorem 1.3 is a significant extension of the results presented in
[12], in which the authors have a stringent requirement ρ = 0.1. In a very recent and
independent work [16], the authors consider a model where both O and Ω are unions
of stochastic and deterministic subsets, while we only assume the stochastic model.
We recommend that interested readers read the paper for the details. However, only
considering their results on stochastic O and Ω, a direct comparison shows that the
number of samples we need is less than that in this reference. The difference is several
logarithmic factors. Actually, the requirement of ρ in our paper is optimal even for
clean data in the literature of MC. Finally, we want to emphasize that the random
support assumption is essential in Theorem 1.3 when the rank is large. Examples can
be found in [24].
We wish to close our introduction with a few words concerning the techniques of
proof we shall use. The proof of Theorem 1.1 is based on the concept of restricted
isometry, which is a standard technique in the literature of CS. However, our argu-
ment involves a generalization of the restricted isometry concept. The proofs of The-
orems 1.2 and 1.3 are based on the golfing scheme, an elegant technique pioneered
by David Gross [21] and later used in [6, 12, 31] to construct dual certificates. Our
proof leverages results from [12]. However, we contribute novel elements by finding
80 Constr Approx (2013) 37:73–99
In the proof of Theorem 1.1, we will see the notation PT x. Here x is a k-dimensional
vector and T is a subset of {1, . . . , k}. We also use T to represent the subspace of
all k-dimensional vectors supported on T . Then PT x is the projection of x onto the
subspace T , which is to keep the value of x on the support T and to change other
elements into zeros. In this section, we use the notation “
.” of “floor function” to
represent the integer part of any real number.
First we generalize the concept of the restricted isometry property (RIP) [8]:
Definition 2.1 For any matrix Φ ∈ Rl×(n+m) , define the RIP-constant δs1 ,s2 by the
infimum value of δ such that
2
x
(1 − δ) x22 + f 22 ≤
Φ f ≤ (1 + δ) x2 + f 2
2 2
2
Proof First, we suppose x1 22 + f1 22 = x2 22 + f2 22 = 1. By the definition of
δs1 ,s2 , we have
x + x2 x + x2
2(1 − δs1 ,s2 ) ≤ Φ 1 ,Φ 1 ≤ 2(1 + δs1 ,s2 )
f1 + f2 f1 + f2
and
x1 − x2 x1 − x 2
2(1 − δs1 ,s2 ) ≤ Φ ,Φ ≤ 2(1 + δs1 ,s2 ).
f1 − f2 f1 − f2
x x
By the above inequalities, we have Φ f11 , Φ f22 ≤ δs1 ,s2 , and hence by homo-
x x
geneity, we have Φ f11 , Φ f22 ≤ δs1 ,s2 x1 22 + f1 22 x2 22 + f2 22 without
the norm assumption.
Lemma 2.3 Suppose Φ ∈ Rl×(n+m) with RIP-constant δ2s1 ,2s2 < 181
(s1 , s2 > 0) and
1 s1 s1
λ is between 2 s2 and 2 s2 . Then for any x ∈ R with | supp(x)| ≤ s1 , any f ∈ Rm
n
It is easy to check that the original (x, f ) satisfies the inequality constraint in (1.7),
so we have
and
−1
PVj f 2 ≤ s2 2 PV0c f 1 . (2.3)
j ≥2
x + x1 = PT0 x + PT0 x1 + PT0c x1 ≥ x1 − PT0 x1 + PT0c x1 ,
(2.4)
and similarly,
Moreover, since
PTj x2 + PVj f 2
j ≥2 j ≥2
− 12 −1
≤ s1 PT0c x1 + s2 2 PV0c f 1 by (2.2) and (2.3),
− 12
1 s1
≤ 2s1 PT0c x1 + λPV0c f 1 by λ > ,
2 s2
− 12
≤ 2s1 PT0 x1 + λPV0 f 1 by (2.6),
− 12
1 1
≤ 2s1 s1 PT0 x2 + λs2 PV0 f 2
2 2
by Cauchy-Schwartz inequality,
s1
≤ 4PT0 x2 + 4PV0 f 2 , by λ < 2 ,
s2
we have
PT x
0 + PT1 x PTj x2 + PVj f 2
PV f PV f
0 2 1 2 j ≥2 j ≥2
≤ 8 PT0 x22 + PT1 x22 + PV0 f 22 + PV1 f 22 .
Since
PTj x2 + PVj f 2 ≤ 4PT0 x2 + 4PV0 f 2 ,
j ≥2 j ≥2
we have
x2 + f 2 ≤ 5 PT0 x2 + PV0 f 2 + (PT1 x2 + PV1 f 2 )
√
≤ 52 PT0 x22 + PT1 x22 + PV0 f 22 + PV1 f 22
4 13 + 13δ2s1 ,2s2
≤ .
1 − 9δ2s1 ,2s2
Constr Approx (2013) 37:73–99 83
We now cite a well-known result in the literature of CS, e.g., Theorem 5.2 of [3].
Lemma 2.4 Suppose A is a random matrix defined in Model 1. Then for any 0 < δ <
1, there exist c1 (δ), c2 (δ) > 0 such that with probability at least 1 − 2 exp(−c2 (δ)m),
Also, we cite a well-known result that can give a bound for the biggest singular
value of random matrix, e.g., [17] and [39].
Proof Suppose α, δ are two constants independent of m and n, whose values will
be specified later. Set s1 =
α log mn +1 and s2 =
αm. We want to bound the RIP-
m
constant δ2s1 ,2s2 for the (n + m) × m matrix Φ = [A, I ] when α is sufficiently small.
For any T with |T | = 2s1 and V with |V | = 2s2 , and any x with supp(x) ⊂ T and f
with supp(f ) ⊂ V , we have
2
[A, I ] x = Ax + f 2 = Ax2 + f 2 + 2PV APT x, f .
f 2 2 2
2
holds universally for any such T and V with probability at least 1 − 2 exp((δ 2 /2 −
α1 − α2 )m).
84 Constr Approx (2013) 37:73–99
In this section, we will encounter several absolute constants. Instead of denoting them
by C1 , C2 , . . . , we just use C, i.e., the values of C change from line to line. Also, we
will use the phrase “with high probability” to mean with probability at least 1−Cn−c ,
where C > 0 is a numerical constant and c = 3, 4, or 5 depending on the context.
Here we will use a lot of notation to represent sub-matrices and sub-vectors. Sup-
pose A ∈ Rm×n , P ⊂ [m] := {1, . . . , m}, Q ⊂ [n], and i ∈ [n]. We denote by AP ,: the
sub-matrix of A with row indices contained in P , by A:,Q the sub-matrix of A with
column indices contained in Q, and by AP ,Q the sub-matrix of A with row indices
contained in P and column indices contained in Q. Moreover, we denote by AP ,i the
sub-matrix of A with row indices contained in P and column i, which is actually a
column vector.
The term “vector” means column vector in this section, and all row vectors are
denoted by an adjoint of a vector, such as a ∗ for a vector a. Suppose a is a vector and
T a subset of indices. Then we denote by aT the restriction of a on T , i.e., a vector
with all elements of a with indices in T . For any vector v, we use v{i} to denote the
i-th element of v.
To prove Theorem 1.2, we need some supporting lemmas. Because our model of
sensing matrix A is the same as in [6], we will cite some lemmas from it directly.
This lemma was proved in [6] by matrix Bernstein’s inequality, which was first
introduced by [2]. A deep generalization is given in [38].
Lemma 3.2 (Lemma 2.4 of [6]) Suppose A is as defined in Model 2. Fix T ⊂ [n]
with |T | = s and v ∈ Rs . Then A∗:,T c A:,T v∞ ≤ 201√s v2 with high probability,
provided s ≤ γ μ log
m
n , where γ is some absolute constant.
Lemma 3.3 (Lemma 2.5 of [6]) Suppose A is as defined in Model 2. Fix T ⊂ [n] with
|T | = s. Then maxi∈T c A∗:,T A:,i 2 ≤ 1 with high probability, provided s ≤ γ μ log
m
n,
where γ is some absolute constant.
In this part, we will give a complete proof of Theorem 1.2 with a powerful technique
called “golfing-scheme” introduced by David Gross in [21], and used later in [12]
and [6]. Under the assumption of Model 2, we additionally assume s ≤ α m 2 and
μ log n
mb ≤ β m μ , where α and β are numerical constants whose values will be specified
later.
m
First we give two useful inequalities. By replacing A with m−m b
AB c ,T in
Lemma 3.1 and Lemma 3.2, we have
m
A∗
A − I ≤ 1/2 (3.1)
m − m B ,T B ,T
c c
b 2,2
and
m
maxc
AB c ,T AB c ,i
∗
≤1 (3.2)
i∈T m − mb 2
(3.1) and (3.2) hold with high probability, provided α and β are sufficiently small.
We assume (3.1) and (3.2) hold throughout this section.
First we prove that the solution (x̂, fˆ) of (1.3) equals (x, f ) if we can find an
appropriate dual vector qB c satisfying the following requirement. This is actually an
“inexact dual vector” of the optimization problem (1.3). This idea was first given
explicitly in [22] and [21], and related to [4]. We give a result similar to [6].
Lemma 3.4 (Inexact Duality) Suppose there exists a vector qB c ∈ Rm−mb satisfying
vT − sgn(xT ) ≤ λ/4, vT c ∞ ≤ 1/4, and qB c ∞ ≤ λ/4, (3.3)
2
where
v = A∗B c ,: qB c + A∗B,: λsgn(fB ). (3.4)
Then the solution (x̂, fˆ) of (1.3) equals (x, f ) provided β is sufficiently small and
λ < 32 .
86 Constr Approx (2013) 37:73–99
hT c = x̂T c . (3.5)
x̂1 + λfˆ1
= x̂T , sgn(x̂T ) + x̂T c 1 + λ fˆB , sgn(fˆB ) + fˆB c 1
≥ x̂T , sgn(xT ) + x̂T c 1 + λ fˆB , sgn(fB ) + fˆB c 1
= xT + hT , sgn(xT ) + hT c 1 + λ fB − AB,: h, sgn(fB ) + AB c ,: h1
by (3.5), (3.6)
= x1 + λf 1 + hT c 1 + λAB c ,: h1 + hT , sgn(xT ) − λ AB,: h, sgn(fB ) .
By (3.4), we have
hT , vT + hT c , vT c = h, v = h, A∗B c ,: qB c + A∗B,: λsgnfB
= AB c ,: h, qB c + λAB,: h, sgnfB ,
λ 3 3
− hT 2 + λAB c ,: h1 + hT c 1 ≤ 0. (3.8)
4 4 4
By (3.1), we have m−m
m ∗ ≤ 3
b
AB c ,T 2,2 2 , and the smallest singular value of
m ∗ is at least 12 . Therefore,
m−mb AB c ,T AB ,T
c
Constr Approx (2013) 37:73–99 87
m
hT 2 ≤ 2
A∗B c ,T AB c ,T hT
m − mb 2
m m
≤2 A ∗
m − m B ,T B ,T T
c A c c h c + A∗
c A
m − m B ,T B ,:
c h
b 2 b 2
m √ m
≤ 2 ∗
m − m AB c ,T AB c ,T c hT c + 6 m − m AB c ,: h
b 2 b 2
m √ m
≤2 ∗
m − m AB c ,T AB c ,i |h{i} | + 6 m − m AB c ,: h ,
c
i∈T b 2 b 2
We know 34 − 46 m−m m
b
> 0 when β is sufficiently small. Moreover, by the assump-
tion λ < 32 , we have hT c = 0 and AB c ,: h = 0. Since AB c ,: h = AB c ,T hT + AB c ,T c hT c ,
we have AB c ,T hT = 0. The inequality (3.1) implies that AB c ,T is injective, so hT = 0
and h = hT + hT c = 0, which implies (x̂, fˆ) = (x, f ).
m1 m2 mk
s ≤ αC , s ≤ αC , s ≤ αC for k = 3, . . . , l.
μ log2 n μ log2 n μ log n
(3.9)
Then by Lemma 3.1, replacing A with mmj AGj ,T , we have the following inequali-
ties:
m ∗ 1
A A − I ≤ √ for j = 1, 2; (3.10)
m Gj ,T Gj ,T 2 log n
j 2,2
m ∗ 1
A
m Gj ,T AGj ,T − I ≤ 2
for j = 3, . . . , l; (3.11)
j 2,2
and
m ∗
pi = I − AGi ,T AGi ,T pi−1
mi
m ∗ m ∗
= I− A AGi ,T · · · I − A AG1 ,T p0 (3.13)
mi Gi ,T m1 G1 ,T
for i = 1, . . . , l, and construct
⎡ m ⎤
m1 AG1 ,T p0
⎢ .. ⎥
qB c = ⎣ . ⎦. (3.14)
m
ml AGl ,T pl−1
We now bound the 2 norm of pi . Actually, by (3.10), (3.11), and (3.13), we have
1
p1 2 ≤ √ p0 2 , (3.16)
2 log n
1
p2 2 ≤ p0 2 , (3.17)
4 log n
j
1 1
pj 2 ≤ p0 2 for j = 3, . . . , l. (3.18)
log n 2
Now we will prove our constructed qB c satisfies the desired requirements.
Constr Approx (2013) 37:73–99 89
The Proof of uT + λA∗B,T sgn(fB ) − sgn(xT )2 ≤ λ4 By (3.15) and (3.13), we
have uT = li=1 mmi A∗Gi ,T AGi ,T pi−1 = li=1 (pi−1 − pi ) = p0 − pl . Then by
(3.12), we have uT + λA∗B,T sgn(fB ) − sgn(xT )2 = uT − p0 2 = pl 2 . Since
√
λA∗B,: sgn(fB )∞ ≤ 1/8, we have λA∗B,T sgn(fB )2 ≤ 18 s, which implies
9√
p0 2 = λA∗B,T sgn(fB ) − sgn(xT )2 ≤ s. (3.19)
8
√
Then by (3.18) and l =
log2 n + 1, we have pl 2 ≤ log1 n ( 12 )l 98 s ≤ ( log1 n )( n1 )( 98 ) ·
2 ≤ 4 log n = 4 , provided α is sufficiently small.
αm √1 λ
μ log n
The Proof of uT c ∞ ≤ 1/8 By (3.15), we have uT c = li=1 mmi A∗Gi ,T c AGi ,T pi−1 .
Recall that AG1 ,: , . . . , AGl ,: are independent, so by the construction
of pi−1 we
m
know AGi ,: and pi−1 are independent. Replacing A with mi AGi ,: in Lemma
3.2, and by the sparsity condition (3.9), we have li=1 mmi A∗Gi ,T c AGi ,T pi−1 ∞ ≤
l
i=1 20 s pi−1 2 with high probability, provided α is sufficiently small. By (3.16),
1 √1
(3.17), (3.18), and (3.19), we have uT c ∞ ≤ li=1 20 1 √1
s
pi−1 2 ≤ 20
1 √1
s
2p0 2
< 18 .
∗
ak1
The Proof of qB c ∞ ≤ For k = 1, . . . , l, we denote AGk ,: =
λ
4
√1
, and
m a∗
...
ã1∗ k km
Set
m ∗ m ∗
w= I − A AG1 ,T ··· I − A AGk−1 ,T (akj )T . (3.20)
m1 G1 ,T mk−1 Gk−1 ,T
90 Constr Approx (2013) 37:73–99
and
$ 2 % $ % $ %
E (ãi )∗T w z{i} = E w ∗ (ãi )T (ãi )∗T w = w ∗ E (ãi )T (ãi )∗T w = w22 .
√
By choosing some numerical constant C and t = C m log nw2 , we have
∗ ∗
w A sgn(fB ) ≤ C log nw2 (3.22)
B,T
Here we would like to compare our golfing scheme with that in [6]. There are
mainly two differences. One is that we have an extra term λA∗B,: sgn(fB ) in the dual
vector. To obtain the inequality vT c ∞ ≤ 1/4, we propose to bound uT c ∞ and
λA∗B,: sgn(fB )∞ , respectively, and this will lead to the extra log factor compared
with [6]. Moreover, by using the golfing scheme to construct the dual vector, we need
to bound the term qB c ∞ , which is not necessary in [6]. This inevitably incurs the
random signs assumptions of the signal.
In this section, the capital letters X, Y , etc. represent matrices, and the symbols in
script font I, PT , etc. represent linear operators from a matrix space to a matrix
space. Moreover, for any Ω0 ⊂ [n] × [n], we have PΩ0 M to keep the entries of M on
the support Ω0 and to change other entries into zeros. For any n × n matrix A, denote
by AF , A, A∞ , and A∗ , respectively, the Frobenius norm, operator norm
(the largest singular value), the biggest magnitude of all elements, and the nuclear
norm(the sum of all singular values).
Similarly to Sect. 3, instead of denoting them as C1 , C2 , . . . , we just use C, whose
values change from line to line. Also, we will use the phrase “with high probability”
to mean with probability at least 1 − Cn−c , where C > 0 is a numerical constant and
c = 3, 4, or 5 depending on the context.
Model 3.1 is natural and used in [12], but we will use the following equivalent model
for convenience:
Model 3.2
1. Fix an n by n matrix K, whose entries are either 1 or −1.
2. Define two independent random subsets of [n] × [n]: Γ ∼ Ber((1 − 2s)ρ) and
Ω ∼ Ber( 1−ρ+2sρ
2sρ
). Moreover, let O := Γ ∪ Ω , which thus satisfies O ∼
Ber(ρ).
3. Define an n × n random matrix W with independent entries Wij satisfying
P(Wij = 1) = P(Wij = −1) = 12 .
4. Define Ω ⊂ Ω : Ω := {(i, j ) : (i, j ) ∈ Ω , Wij = Kij }.
5. Define Ω := Ω /Γ , and Γ := O/Ω.
6. Let S satisfy sgn(S) := PΩ (K).
Obviously, in both Model 3.1 and Model 3.2, the whole setting is determinis-
tic if we fix (O, Ω). Therefore, the probability of (L̂, Ŝ) = (L, S) is determined
by the joint distribution of (O, Ω). It is not difficult to prove that the joint dis-
tributions of (O, Ω) in both models are the same. Indeed, in Model 3.1, we have
that (1{(i,j )∈O} , 1{(i,j )∈Ω} ) are iid random vectors with the probability distribu-
tion P(1{(i,j )∈O} = 1) = ρ, P(1{(i,j )∈Ω} = 1|1{(i,j )∈O} = 1) = s and P(1{(i,j )∈Ω} =
92 Constr Approx (2013) 37:73–99
This implies that (1{(i,j )∈O} , 1{(i,j )∈Ω} ) are independent random vectors. More-
over, it is easy to calculate that P(1{(i,j )∈O} = 1) = ρ, P(1{(i,j )∈Ω} = 1) = sρ and
P(1{(i,j )∈Ω} = 1, 1{(i,j )∈O} = 0) = 0. Then we have
P(1{(i,j )∈Ω} = 1|1{(i,j )∈O} = 1) = P(1{(i,j )∈Ω} = 1, 1{(i,j )∈O} = 1)/ P(1{(i,j )∈O} = 1)
=s
and
P(1{(i,j )∈Ω} = 1|1{(i,j )∈O} = 0) = P(1{(i,j )∈Ω} = 1, 1{(i,j )∈O} = 0)/ P(1{(i,j )∈O} = 0)
= 0.
Notice that although (1{(i,j )∈O} , 1{(i,j )∈Ω} ) depends on K, its distribution does not.
By the above, we know that (O, Ω) has the same distribution in both models. There-
fore in the following we will use Model 3.2 instead. The advantage of using Model 3.2
is that we can utilize Γ , Ω , W , etc. as auxiliaries.
In the next section, we prove some supporting lemmas which are useful for the
proof of the main theorem.
Lemma 4.1 (Theorem 4.1 of [7]) Suppose Ω0 ∼ Ber(ρ0 ). Then with high probability,
PT − ρ0−1 PT PΩ0 PT ≤ , provided that ρ0 ≥ C0 −2 μr log
n
n
for some numerical
constant C0 > 0.
Lemma 4.3 (Theorem 6.3 of [7]) Suppose Z is a fixed √ matrix and Ω0 ∼ Ber(ρ0 ).
Then with high probability, (ρ0 I − PΩ0 )Z ≤ C0 np log nZ∞ provided that
ρ0 ≤ p and p ≥ C0 logn n for some numerical constants C0 > 0 and C0 > 0.
Constr Approx (2013) 37:73–99 93
Notice that we only have ρ0 = p in Theorem 6.3 of [7]. By a very slight modifica-
tion in the proof (specifically, the proof of Lemma 6.2), we can have ρ0 ≤ p as stated
above.
where λ = √ 1
nρ log n
, then the solution (L̂, Ŝ) to (1.6) satisfies (L̂, Ŝ) = (L, S).
which implies
λS1 − λPO/Γ (Ŝ)1 ≥ H, U V ∗ + PT ⊥ (H )∗ + λPΓ (Ŝ)1 .
By the two inequalities above and the fact PΓ Ŝ = PΓ (Ŝ − S) = −PΓ H , we have
P ⊥ (H ) + λPΓ (H ) ≤ H, λPO/Γ (W ) − U V ∗ . (4.2)
T ∗ 1
94 Constr Approx (2013) 37:73–99
By inequality (4.2),
3
P ⊥ (H ) + 3λ PΓ (H ) ≤ λ PT (H ) .
T ∗ 1 2 F
(4.3)
4 4 n
Recall that we assume (1−2s)ρ
1
PT PΓ PT − PT ≤ 12 and √(1−2s)ρ1
PT PΓ ≤
√
3/2 throughout the paper. Then
1
PT (H ) ≤ 2
F (1 − 2s)ρ PT PΓ PT (H )
F
1 1
≤ 2
PT PΓ PT ⊥ (H ) + 2 PT PΓ (H )
(1 − 2s)ρ (1 − 2s)ρ
F F
* *
6 6
≤ P ⊥ H F + PΓ H F .
(1 − 2s)ρ T (1 − 2s)ρ
satisfying
Suppose we can construct Y and Y
⎧
⎪
⎪ PT Y + PT (λPΩ W − U V ∗ )F ≤ λ
,
⎨ 2n2
PT ⊥ Y + PT ⊥ (λPΩ W ) ≤ 14 ,
(4.4)
⎪
⎪ P c Y = 0,
⎩ Γ
PΓ Y ∞ ≤ λ4 ,
and
⎧
∗
⎪ PT Y + PT (λ(2PΩ /Γ (W ) − PΩ W ) − U V )F ≤
λ
⎪
2n2
,
⎨
PT ⊥ Y + PT ⊥ (λ(2PΩ /Γ (W ) − PΩ W )) ≤ 14 ,
= 0, (4.5)
⎪
⎪ PΓ c Y
⎩
∞ ≤ λ .
PΓ Y 4
Constr Approx (2013) 37:73–99 95
Proof of Theorem 1.3 Notice that Γ ∼ Ber((1 − 2s)ρ). Suppose that q satisfies
1 − (1 − 2s)ρ = (1 − (1−2s)ρ 6 )2 (1 − q)l−2 , where l =
5 log n + 1. This implies
that q ≥ Cρ/ log(n). Define q1 = q2 = (1 − 2s)ρ/6 and q3 = · · · = ql = q. Then in
distribution, we can let Γ = Γ1 ∪ · · · ∪ Γl , where Γj ∼ Ber(qj ) independently.
Construct
⎧ ∗
⎨ Z0 = PT (U V 1 − λPΩ W ),
⎪
Zj = (PT − qj PT PΓj PT )Zj −1 for j = 1, . . . , j0 ,
⎩ Y = l
⎪ 1
PΓ Zj −1 .
j =1 qj j
1
Z1 ∞ ≤ √ Z0 ∞
2 log n
and
1
Zj ∞ ≤ Z0 ∞ for j = 2, . . . , l
2j log n
with high probability, provided Cρ is large enough and Cs is small enough. Also, by
Lemma 4.3, we have
*
1
I − PΓ Zj −1 ≤ C n log n Zj −1 ∞ for j = 1, . . . , l
q j q
where we have
2sρ
EXj2 = PT ei e∗ 2 ≤ Cρs μr
1 − ρ + 2sρ j F
n
and
2μr
M = PT ei ej∗ ∞ ≤ .
n
Then with high probability, we have PT PΩ (W )∞ ≤ C ρ μr log n
√
n
2 n μr log n
> C Cρ M log n). Then by U V ∗ ∞ ≤ n , we have
μr
(≥ C Cρ μr log n n
√
μr √
Z0 ∞ ≤ C n , which implies Z0 F ≤ nZ0 ∞ ≤ C μr .
Now we want to prove Y satisfies (4.4) with high probability. Obviously, PΓ c Y =
0. It suffices to prove
⎧
⎪
⎪ PT Y + PT (λPΩ (W ) − U V ∗ )F ≤ λ
2n2
,
⎨
PT ⊥ Y ≤ 18 ,
(4.6)
⎪ PT ⊥ (λPΩ (W )) ≤ 18 ,
⎪
⎩
PΓ Y ∞ ≤ λ4 .
First,
PT Y + PT λPΩ (W ) − U V ∗
F
+ l ,
1
= Z0 − PT PΓj Zj −1
qj
j =1 F
+ l ,
1
= PT Z0 − PT PΓj PT Zj −1
qj
j =1 F
+j ,
0
1 1
= PT − PT PΓ1 PT Z0 − PT PΓj PT Zj −1
q1 qj
j =2 F
+ l ,
1
= PT Z1 − PT PΓj PT Zj −1
qj
j =2 F
l
1 √ λ
= · · · = Zl F ≤C μr ≤ 2 .
2 n
Constr Approx (2013) 37:73–99 97
Second,
l
1
PT ⊥ Y = PT ⊥ PΓj Zj −1
qj
j =1
l
1
≤
q PT ⊥ PΓj Zj −1
j
j =1
l
= P ⊥ 1 PΓ Zj −1 − Zj −1
T q j
j
j =1
l
1
≤ PΓ Zj −1 − Zj −1
q j
j
j =1
*
l
n log n
≤ C Zj −1 ∞
qj
j =1
+ l ,
1 1 1
≤ C n log n √ + √ √ +√ Z0 ∞
2j −1 log n qj 2 log n q2 q1
j =3
√
nμr log n 1
≤C √ ≤ √ ,
n ρ 8 log n
provided Cρ is sufficiently large.
Third, we have λPT ⊥ PΩ (W ) ≤ λPΩ (W ). Notice that Wij is an independent
Rademacher sequence independent of Ω . By Lemma 4.3, we have
2sρ
W − P (W ) ≤ C np log nW ∞
1 − ρ + 2sρ Ω 0
2sρ
with high probability, provided 1−ρ+2sρ ≤ p and p ≥ C0 logn n . By Theorem 3.9 of
√
[39], we have W ∞ ≤ C1 n with high probability. Therefore,
√ 2sρ
PΩ (W ) ≤ C np log n + C1 n .
0
1 − ρ + 2sρ
√
nρ log n
By choosing p = Cρ2 for some appropriate C2 , we have PΩ (W ) ≤ 8 , pro-
vided Cρ is large enough and Cs is small enough.
Fourth,
1
PΓ Y ∞ = P
Γ P Z
Γj j −1
qj ∞
j
1
≤ Zj −1 ∞
qj
j
98 Constr Approx (2013) 37:73–99
+ l ,
1 1 1 1 1
≤ + √ + Z0 ∞
qj 2j −1 log n q2 2 log n q1
j =3
√
μr λ
≤C ≤ √ ,
nρ 4 log n
provided Cρ is sufficiently large.
Notice that in [12] the authors used a very similar golfing scheme. To compare
these two methods, we use here a golfing scheme of nonuniform sizes to achieve a
result with fewer log factors. Moreover, unlike in [12], where the authors used both
the golfing scheme and the least square method to construct two parts of the dual ma-
trix, here we only use the golfing scheme. Actually, the method to construct the dual
matrix in [12] cannot be applied directly to our problem when ρ = O(r log2 n/n).
Acknowledgements I am grateful to my Ph.D. advisor, Emmanuel Candès, for his encouragements and
his help in preparing this manuscript.
References
1. Agarwal, A., Negahban, S., Wainwright, M.: Noisy matrix decomposition via convex relaxation: opti-
mal rates in high dimensions. In: Proc. 28th Inter. Conf. Mach. Learn. (ICML), pp. 1129–1136 (2011)
2. Ahlswede, R., Winter, A.: Strong converse for identification via quantum channels. IEEE Trans. Inf.
Theory 48(3), 569–579 (2002)
3. Baraniuk, R., Davenport, M., DeVore, R., Wakin, M.: A simple proof of the restricted isometry prop-
erty for random matrices. Constr. Approx. 28(3), 253–263 (2008)
4. Candès, E., Plan, Y.: Matrix completion with noise. In: Proceedings of the IEEE (2009)
5. Candès, E., Plan, Y.: Near-ideal model selection by 1 minimization. Ann. Stat. 37(5A), 2145–2177
(2009)
6. Candès, E., Plan, Y.: A probabilistic and RIPless theory of compressed sensing. IEEE Trans. Inf.
Theory 57(11), 7235–7254 (2011)
7. Candès, E., Recht, B.: Exact matrix completion via convex optimization. Found. Comput. Math. 9(6)
(2009)
8. Candès, E., Tao, T.: Decoding by linear programming. IEEE Trans. Inf. Theory 51(12) (2005)
9. Candès, E., Tao, T.: The power of convex relaxation: near-optimal matrix completion. IEEE Trans.
Inf. Theory 56(5), 2053–2080 (2010)
10. Candès, E., Romberg, J., Tao, T.: Robust uncertainty principles: exact signal reconstruction from
highly incomplete frequency information. IEEE Trans. Inf. Theory 52(2), 489–509 (2006)
11. Candès, E., Romberg, J., Tao, T.: Stable signal recovery from incomplete and inaccurate measure-
ments. Commun. Pure Appl. Math. 59(8), 1207–1223 (2006)
12. Candès, E., Li, X., Ma, Y., Wright, J.: Robust principal component analysis? J. ACM 58(3) (2011)
13. Chandrasekaran, V., Sanghavi, S., Parrilo, P., Willsky, A.: Sparse and low-rank matrix decomposi-
tions. In: 15th IFAC Symposium on System Identification (SYSID) (2009)
14. Chandrasekaran, V., Sanghavi, S., Parrilo, P., Willsky, A.: Rank-sparsity incoherence for matrix de-
composition. SIAM J. Optim. 21(2), 572–596 (2011)
15. Chen, S., Donoho, D., Saunders, M.: Atomic decomposition by basis pursuit. SIAM J. Sci. Comput.
20(1), 33–61 (1998)
16. Chen, Y., Jalali, A., Sanghavi, S., Caramanis, C.: Low-rank matrix recovery from errors and erasures.
ISIT (2011)
17. Davidson, K., Szarek, S.: Local operator theory, random matrices and Banach spaces. Handb. Geom.
Banach Spaces I(8), 317–366 (2001)
18. Donoho, D.: For most large underdetermined systems of linear equations the minimal l1-norm solu-
tion is also the sparsest solution. Commun. Pure Appl. Math. 59(6), 797–829 (2006)
Constr Approx (2013) 37:73–99 99
19. Donoho, D.: Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006)
20. Fazel, M.: Matrix rank minimization with applications. Ph.D. Thesis (2002)
21. Gross, D.: Recovering low-rank matrices from few coefficients in any basis. IEEE Trans. Inf. Theory
57(3), 1548–1566 (2011)
22. Gross, D., Liu, Y.-K., Flammia, S., Becker, S., Eisert, J.: Quantum state tomography via compressed
sensing. Phys. Rev. Lett. 105(15) (2010)
23. Haupt, J., Bajwa, W., Rabbat, M., Nowak, R.: Compressed sensing for networked data. IEEE Signal
Process. Mag. 25(2), 92–101 (2008)
24. Hsu, D., Kakade, S., Zhang, T.: Robust matrix decomposition with sparse corruptions. IEEE Trans.
Inf. Theory 57(11), 7221–7234 (2011)
25. Keshavan, R., Montanari, A., Oh, S.: Matrix completion from a few entries. IEEE Trans. Inf. Theory
56(6), 2980–2998 (2010)
26. Laska, J., Davenport, M., Baraniuk, R.: Exact signal recovery from sparsely corrupted measurements
through the pursuit of justice. In: Asilomar Conference on Signals Systems and Computers (2009)
27. Laska, J., Boufounos, P., Davenport, M., Baraniuk, R.: Democracy in action: quantization, saturation,
and compressive sensing. Appl. Comput. Harmon. Anal. 31(3), 429–443 (2011)
28. Li, Z., Wu, F., Wright, J.: On the systematic measurement matrix for compressed sensing in the pres-
ence of gross errors. In: Data Compression Conference, pp. 356–365 (2010)
29. Nguyen, N., Tran, T.: Exact recoverability from dense corrupted observations via l1 minimization.
Preprint (2011)
30. Ngyuen, N., Nasrabadi, N., Tran, T.: Robust lasso with missing and grossly corrupted observations.
Preprint (2011)
31. Recht, B.: A simpler approach to matrix completion. J. Mach. Learn. Res. 12, 3413–3430 (2011)
32. Recht, B., Fazel, M., Parillo, P.: Guaranteed minimum-rank solutions of linear matrix equations via
nuclear norm minimization. SIAM Rev. 52(3) (2010)
33. Romberg, J.: Compressive sensing by random convolution. SIAM J. Imaging Sci. 2(4), 1098–1128
(2009)
34. Rudelson, M.: Random vectors in the isotropic position. J. Funct. Anal. 164(1), 60–72 (1999)
35. Rudelson, M., Vershynin, R.: On sparse reconstruction from Fourier and Gaussian measurements.
Commun. Pure Appl. Math. 61(8), 1025–1045 (2008)
36. Studer, C., Kuppinger, P., Pope, G., Bölcskei, H.: Recovery of sparsely corrupted signals. Preprint
(2011)
37. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58(1), 267–288
(1996)
38. Tropp, J.: User-friendly tail bounds for sums of random matrices. Found. Comput. Math. (2011)
39. Vershynin, R.: Introduction to the non-asymptotic analysis of random matrices. In: Eldar, Y., Ku-
tyniok, G. (eds.) Compressed Sensing, Theory and Applications, pp. 210–268. Cambridge University
Press, Cambridge (2012), Chap. 5
40. Wright, J., Ma, Y.: Dense error correction via 1 -minimization. IEEE Trans. Inf. Theory 56(7), 3540–
3560 (2010)
41. Wright, J., Yang, A.Y., Ganesh, A., Sastry, S., Ma, Y.: Robust face recognition via sparse representa-
tion. IEEE Trans. Pattern Anal. Mach. Intell. 31(2), 210–227 (2009)
42. Wu, L., Ganesh, A., Shi, B., Matsushita, Y., Wang, Y., Ma, Y.: Robust photometric stereo via low-rank
matrix completion and recovery. In: Proceedings of the 10th Asian Conference on Computer Vision,
Part III (2010)
43. Xu, H., Caramanis, C., Sanghavi, S.: Robust PCA via outlier pursuit. In: Ad. Neural Infor. Proc. Sys.
(NIPS), pp. 2496–2504 (2010)