0% found this document useful (0 votes)
59 views27 pages

2013-Compressed Sensing and Matrix Completion With Constant Proportion of Corruptions

This document discusses recovering sparse signals and low-rank matrices from corrupted measurement data using compressed sensing and matrix completion techniques. It introduces three main results: 1) In compressed sensing with iid Gaussian sensing matrices, exact recovery of the sparse signal and corruption is possible using convex optimization even if a constant proportion of measurements are corrupted, provided the signal sparsity is O(m/(log(n/m)+1)). 2) In a more general compressed sensing model, exact recovery is still possible with signal sparsity O(m/log^2(n)) and a constant proportion of corrupted measurements. 3) Low-rank matrices can be recovered from a constant proportion of corrupted sampled entries using convex

Uploaded by

Hongqing Yu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views27 pages

2013-Compressed Sensing and Matrix Completion With Constant Proportion of Corruptions

This document discusses recovering sparse signals and low-rank matrices from corrupted measurement data using compressed sensing and matrix completion techniques. It introduces three main results: 1) In compressed sensing with iid Gaussian sensing matrices, exact recovery of the sparse signal and corruption is possible using convex optimization even if a constant proportion of measurements are corrupted, provided the signal sparsity is O(m/(log(n/m)+1)). 2) In a more general compressed sensing model, exact recovery is still possible with signal sparsity O(m/log^2(n)) and a constant proportion of corrupted measurements. 3) Low-rank matrices can be recovered from a constant proportion of corrupted sampled entries using convex

Uploaded by

Hongqing Yu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Constr Approx (2013) 37:73–99

DOI 10.1007/s00365-012-9176-9

Compressed Sensing and Matrix Completion


with Constant Proportion of Corruptions

Xiaodong Li

Received: 4 May 2011 / Revised: 18 January 2012 / Accepted: 18 April 2012 /


Published online: 19 December 2012
© Springer Science+Business Media New York 2012

Abstract In this paper, we improve existing results in the field of compressed sens-
ing and matrix completion when sampled data may be grossly corrupted. We intro-
duce three new theorems. (1) In compressed sensing, we show that if the m × n
sensing matrix has independent Gaussian entries, then one can recover a sparse sig-
nal x exactly by tractable 1 minimization even if a positive fraction of the mea-
surements are arbitrarily corrupted, provided the number of nonzero entries in x is
O(m/(log(n/m) + 1)). (2) In the very general sensing model introduced in Candès
and Plan (IEEE Trans. Inf. Theory 57(11):7235–7254, 2011) and assuming a positive
fraction of corrupted measurements, exact recovery still holds if the signal now has
O(m/(log2 n)) nonzero entries. (3) Finally, we prove that one can recover an n × n
low-rank matrix from m corrupted sampled entries by tractable optimization provided
the rank is on the order of O(m/(n log2 n)); again, this holds when there is a positive
fraction of corrupted samples.

Keywords Compressed sensing · Matrix completion · Robust PCA · Convex


optimization · Restricted isometry property · Golfing scheme

Mathematics Subject Classification Primary: 49K45 · Secondary: 62G35

1 Introduction

1.1 Introduction on Compressed Sensing with Corruptions

Compressed sensing (CS) has been well studied in recent years [10, 19]. This novel
theory asserts that a sparse or approximately sparse signal x ∈ Rn can be acquired

Communicated by Joel A. Tropp.


X. Li ()
Department of Mathematics, Stanford University, Stanford, CA 94305, USA
e-mail: [email protected]
74 Constr Approx (2013) 37:73–99

by taking just a few nonadaptive linear measurements. This fact has numerous con-
sequences which are being explored in a number of fields of applied science and
engineering. In CS, the acquisition procedure is often represented as y = Ax, where
A ∈ Rm×n is called the sensing matrix and y ∈ Rm is the vector of measurements
or observations. It is now well established that the solution x̂ to the optimization
problem
min x̃1 such that Ax̃ = y (1.1)

is guaranteed to be the original signal x with high probability, provided x is suffi-
ciently sparse and A obeys certain conditions. A typical result is this: If A has iid
Gaussian entries, then exact recovery occurs provided x0 ≤ Cm/(log(n/m) + 1)
[11, 18, 35] for some positive numerical constant C > 0. Here is another example: If
A is a matrix with rows randomly selected from the Discrete Fourier Transformation
(DFT) matrix, the condition becomes x0 ≤ Cm/ log n [10].
This paper discusses a natural generalization of CS, which we shall refer to as
compressed sensing with corruptions. We assume that some entries of the data vector
y are totally corrupted but we have absolutely no idea which entries are unreliable.
We still want to recover the original signal efficiently and accurately. Formally, we
have the mathematical model  
x
y = Ax + f = [A, I ] , (1.2)
f
where x ∈ Rn and f ∈ Rm . The number of nonzero coefficients in x is x0 and sim-
ilarly for f . As in the above model, A is an m × n sensing matrix, usually sampled
from a probability distribution. The problem of recovering x (and hence f ) from y
has been recently studied in the literature in connection with some interesting appli-
cations. We discuss a few of them.
• Clipping. Signal clipping frequently appears because of nonlinearities in the acqui-
sition device [27, 36]. Here, one typically measures g(Ax) rather than Ax, where g
is always a nonlinear map. Letting f = g(Ax) − Ax, we thus observe y = Ax + f .
Nonlinearities usually occur at large amplitudes so that for those components with
small amplitudes, we have f = g(Ax) − Ax = 0. This means that f is sparse and,
therefore, our model is appropriate. Just as before, locating the portion of the data
vector that has been clipped may be difficult because of additional noise.
• CS for networked data. In a sensor network, different sensors will collect measure-
ments of the same signal x independently (they each measure zi = ai , x) and
send the outcome to a center hub for analysis [23, 29]. By setting ai as the row
vectors of A, this is just z = Ax. However, typically some sensors will fail to send
the measurements correctly, and will sometimes report totally meaningless mea-
surements. Therefore, we collect y = Ax + f , where f models recording errors.
There have been several theoretical papers investigating the exact recovery method
for CS with corruptions [26, 28, 29, 36, 40], and all of them consider the following
recovery procedure in the noiseless case:
 
˜ ˜ x̃
min x̃1 + λ(m, n)f 1 such that Ax̃ + f = [A, I ] ˜ = y. (1.3)
x̃,f˜ f

We will compare them with our results in Sect. 1.4.


Constr Approx (2013) 37:73–99 75

1.2 Introduction on Matrix Completion with Corruptions

Matrix completion (MC) bears some similarity to CS. Here, the goal is to recover a
low-rank matrix L ∈ Rn×n from a small fraction of linear measurements. For simplic-
ity, we suppose the matrix is square as above (the general case is similar). The stan-
dard model is that we observe PO (L), where O ⊂ [n] × [n] := {1, . . . , n} × {1, . . . , n}
and

Lij if (i, j ) ∈ O;
PO (L)ij =
0 otherwise.
The problem is to recover the original matrix L, and there have been many papers
studying this problem in recent years, see [7, 9, 21, 25, 32], for example. Here one
minimizes the nuclear norm—the sum of all the singular values [20]—to recover the
original low-rank matrix. We discuss below an improved result due to Gross [21]
(with a slight difference).
Define O ∼ Ber(ρ) for some 0 < ρ < 1 to mean that 1{(i,j )∈O} are iid Bernoulli
random variables with parameter ρ. Then the solution to

 ∗
min L  = PO (L)
such that PO (L) (1.4)

L

C rμ log2 n
is guaranteed to be exactly L with high probability, provided ρ ≥ ρ n . Here,
Cρ is a positive numerical constant, r is the rank of L, and μ is an incoherence
parameter introduced in [7] which is only dependent on L.
This paper is concerned with the situation in which some entries may have been
corrupted. Therefore, our model is that we observe

PO (L) + S, (1.5)

where O and L are the same as before and S ∈ Rn×n is supported on Ω ⊂ O. Just as
in CS, this model has broad applicability. For example, Wu et al. used this model in
photometric stereo [42]. This problem has also been introduced in [12] and is related
to recent work in separating a low-rank from a sparse component [12–14, 24, 43].

A typical result is that the solution (L, S) to

 ∗ + λ(m, n)
min L S1  +
such that PO (L) S = PO (L) + S (1.6)

L,S

is guaranteed to be the true pair (L, S) with high probability under some assumptions
about L, O, S [12, 16]. We will compare them with our result in Sect. 1.4.

1.3 Main Results

This section introduces three models and three corresponding recovery results. The
proofs of these results are deferred to Sect. 2 for Theorem 1.1, Sect. 3 for Theo-
rem 1.2, and Sect. 4 for Theorem 1.3.
76 Constr Approx (2013) 37:73–99

1.3.1 CS with iid Matrices [Model 1]

Theorem 1.1 Suppose that A is an m × n (m < n) random matrix whose entries


are iid Gaussian variables with mean 0 and variance 1/m, the signal to acquire is
x ∈ Rn , and our observation is y = Ax + f + w, where f, w ∈ Rm and w2 ≤ .
Then by choosing λ(n, m) = √ 1 , the solution (x̂, fˆ) to
log(n/m)+1
 
min x̃1 + λf˜1 such that (Ax̃ + f˜) − y)2 ≤  (1.7)
x̃,f˜

satisfies x̂ − x2 + fˆ − f 2 ≤ K with probability at least 1 − C exp(−cm).


This holds universally; that is to say, for all vectors x and f obeying x0 ≤
αm/(log(n/m) + 1) and f 0 ≤ αm. Here α, C, c, and K are numerical constants.

In the above statement, the matrix A is random. Everything else is deterministic.


The reader will notice that the number of nonzero entries is on the same order as that
needed for recovery from clean data [3, 11, 19, 35], while the condition of f implies
that one can tolerate a constant fraction of possibly adversarial errors. Moreover, our
convex optimization is related to LASSO [37] and Basis Pursuit [15].

1.3.2 CS with General Sensing Matrices [Model 2]

In this model, m < n and


⎛ ∗⎞
a
1 ⎝ 1⎠
A= √ ... ,
m a∗
m
where a1 , . . . , am are n iid copies of a random vector a whose distribution obeys

the following two properties: (1) Eaa ∗ = I ; (2) a∞ ≤ μ. This model has been
introduced in [6] and includes a lot of the stochastic models used in the literature.
Examples include partial DFT matrices, matrices with iid entries, certain random
convolutions [33] and so on.
In this model, we assume that x and f in (1.2) have fixed support denoted by T
and B, and with cardinality |T | = s and |B| = mb . In the remainder of the paper, xT
is the restriction of x to indices in T and fB is the restriction of f to B. Our main
assumption here concerns the sign sequences: the sign sequences of xT and fB are
independent of each other, and each is a sequence of symmetric iid ±1 variables.

Theorem
√ 1.2 For the model above, the solution (x̂, fˆ) to (1.3), with λ(n, m) =
1/ log n, is exact with probability at least 1 − Cn−3 , provided that s ≤ α m 2
μ log n
and mb ≤ β mμ . Here C, α, and β are some numerical constants.

Above, x and f have fixed supports and random signs. However, by a recent deran-
domization technique first introduced in [12], exact recovery with random supports
and fixed signs would also hold. We will explain this derandomization technique in
the proof of Theorem 1.3. In some specific models, such as independent rows from
Constr Approx (2013) 37:73–99 77

the DFT matrix, μ could be a numerical constant, which implies the proportion of
corruptions is also a constant. An open problem is whether Theorem 1.2 still holds in
the case where x and f have both fixed supports and signs. Another open problem is
to know whether the result would hold under more general conditions about A as in
[5] in the case where x has both random support and random signs.
We emphasize that the sparsity condition x0 ≤ C m 2 is a little stronger
μ log n
than the optimal result available in the noise-free literature [6, 10]), namely,
x0 ≤ C μ logm
n . The extra logarithmic factor appears to be important in the proof
that we explain in Sect. 3, and a third open problem is whether or not it is possible to
remove this factor.
Here we do not give a sensitivity analysis for the recovery procedure as in Model 1.
Actually, by applying a similar method introduced in [6] to our argument in Sect. 3, a
very good error bound could be obtained in the noisy case. However, technically there
is little novelty, but it would make our paper very long. Therefore, we decided to only
discuss the noiseless case and focus on the analysis of sampling rate and corruption
ratio.

1.3.3 MC from Corrupted Entries [Model 3]

We assume L is of rank r and write its reduced Singular value decomposition (SVD)
as L = U ΣV ∗ , where U, V ∈ Rn×r and Σ ∈ Rr×r . Let μ be the smallest quantity
such that for all 1 ≤ i ≤ n,

     
U U ∗ ei 2 ≤ μr , V V ∗ ei 2 ≤ μr , and U V ∗  ≤ μr .
2 n 2 n ∞ n
This model is the same as that originally introduced in [7] and later used in [9, 12, 16,
21, 31]. We observe PO (L) + S, where O ∈ [n] × [n] and S is supported on Ω ⊂ O.
Here we assume that O, Ω, S satisfy the following model:

Model 3.1
1. Fix an n by n matrix K, whose entries are either 1 or −1.
2. Define O ∼ Ber(ρ) for a constant ρ satisfying 0 < ρ < 12 . Specifically speaking,
1{(i,j )∈O} are iid Bernoulli random variables with parameter ρ.
3. Conditioning on (i, j ) ∈ O, assume that (i, j ) ∈ Ω are independent events with
P((i, j ) ∈ Ω|(i, j ) ∈ O) = s. This implies that Ω ∼ Ber(ρs).
4. Define Γ := O/Ω. Then we have Γ ∼ Ber(ρ(1 − s)).
5. Let S be supported on Ω, and sgn(S) := PΩ (K).

2
Theorem 1.3 Under Model 3.1, suppose ρ > Cρ μr log n
n
and s ≤ Cs . Moreover, sup-
pose λ := √ρn log n , and denote (L̂, Ŝ) as the optimal solution to the problem (1.6).
1

Then we have (L̂, Ŝ) = (L, S) with probability at least 1 − Cn−3 for some numeri-
cal constant C, provided the numerical constants Cs is sufficiently small and Cρ is
sufficiently large.
78 Constr Approx (2013) 37:73–99

In this model, O is available while Ω, Γ and S are not known explicitly from
the observation PO (L) + S. By the assumption O ∼ Ber(ρ), we can use |O|/(n2 )
to approximate ρ. From the following proof we can see that λ is not required to be
√ 1 exactly for the exact recovery. The power of our result is that one can recover
ρn log n
a low-rank matrix from a nearly minimal number of samples even when a constant
proportion of these samples has been corrupted.
We only discuss the noiseless case for this model. Actually by a method similar
to [5], a suboptimal estimation error bound can be obtained by a slight modification
of our argument. However, it is of little interest technically and beyond the optimal
result when n is large. There are other suboptimal results for matrix completion with
noise, such as [1], but the error bound is not tight when the additional noise is small.
We want to focus on the noiseless case in this paper and leave the problem with noise
for future work.
The values of λ are chosen for theoretical guarantee of exact recovery in Theo-
rems 1.1, 1.2, and 1.3. In practice, λ is usually taken by cross validation.

1.4 Comparison with Existing Results, Relative Works, and Our Contribution

In this section we will compare Theorems 1.1, 1.2, and 1.3 with existing results in
the literature.
We begin with Model 1. In [40], Wright and Ma discussed a model where the
sensing matrix A has independent columns with common mean μ and normal pertur-
bations with variance σ 2 /m. They chose λ(m, n) = 1 and proved that (x̂, fˆ) = (x, f )
with high probability, provided x0 ≤ C1 (σ, n/m)m, f 0 ≤ C2 (σ, n/m)m, and f
has random signs. Here C1 (σ, 1/m) is much smaller than C/(log(n/m) + 1). We no-
tice that since the authors of [40] talked about a different model, which is motivated
by [41], it may not be comparable with ours directly. However, for our motivation of
CS with corruptions, we assume A satisfy a symmetric distribution and get a better
sampling rate.
A bit later, Laska et al. [26] and Li et al. [28] also studied this problem. By setting
λ(m, n) = 1, both papers establish that for Gaussian (or sub-Gaussian) sensing matri-
ces A, if m > C(x0 + f 0 ) log((n + m)/(x0 + f 0 )), then the recovery is ex-
act. This follows from the fact that [A, I ] obeys a restricted isometry property known
to guarantee exact recovery of sparse vectors via 1 minimization. Furthermore, the
sparsity requirement about x is the same as that found in the standard CS literature,
namely, x0 ≤ Cm/(log(n/m) + 1). However, √ the result does not allow a positive
fraction of corruptions. For example, if m = n, we have f 0 /m ≤ 2/ log n, which
will go to zero as n goes to zero.
As for Model 2, an interesting piece of work [29] (and later [30] on the noisy
case) appeared during the preparation of this paper. These papers discuss models in
which A is formed by selecting rows from an orthogonal matrix with low incoherence
parameter μ, which is the minimum value such that n|Aij |2 ≤ μ for any i, j . The
main result states that selecting λ = n/(Cμm log n) gives exact recovery under
the following assumptions: (1) the rows of A are chosen from an orthogonal matrix
uniformly at random; (2) x is a random signal with independent signs and equally
likely to be either ±1; (3) the support of f is chosen uniformly at random. (By the
Constr Approx (2013) 37:73–99 79

derandomization technique introduced in [12] and used in [29], it would have been
sufficient to assume that the signs of f are independent and take on the values ±1
with equal probability). Finally, the sparsity conditions require m ≥ Cμ2 x0 (log n)2
and f 0 ≤ Cm, which are nearly optimal, for the best known sparsity condition
when f = 0 is m ≥ Cμx0 log n. In other words, the result is optimal up to an extra
factor of μ log n; the sparsity condition about f is of course nearly optimal.
However, the model for A does not include some models frequently discussed
in the literature such as subsampled tight or continuous frames. Against this back-
ground, a recent paper of Candès and Plan [6] considers a very general framework,
which includes a lot of common models in the literature. Theorem 1.2 in our paper
is similar to Theorem 1 in [29]. It assumes similar sparsity conditions, but is based
on this much broader and more applicable model introduced in [6]. Notice that we
require m ≥ Cμx0 (log n)2 , whereas [29] requires m ≥ Cμ2 x0 (log n)2 . There-
fore, we improve the condition by a factor of μ, which is always at least 1 and can
be as large as n. However, our result imposes f 0 ≤ Cm/μ, which is worse than
f 0 ≤ γ m by the same factor. In [29], the parameter λ depends upon μ, while our
λ is only a function of m and n. This is why the results differ, and we prefer to use
a value of λ that does not depend on μ because in some applications, an accurate
estimate of μ may be difficult to obtain. In addition, we use different techniques of
proof in which the clever golfing scheme of [21] is exploited.
Sparse approximation is another problem of an underdetermined linear system
where the dictionary matrix A is always assumed to be deterministic. Readers inter-
ested in this problem (which always requires stronger sparsity conditions) may also
want to study the recent paper [36] by Studer et al. There, the authors introduce a
more general problem of the form y = Ax + Bf and analyze the performance of
1 -recovery techniques by using ideas which have been popularized under the name
of generalized uncertainty principles in the basis pursuit and sparse approximation
literature.
As for Model 3, Theorem 1.3 is a significant extension of the results presented in
[12], in which the authors have a stringent requirement ρ = 0.1. In a very recent and
independent work [16], the authors consider a model where both O and Ω are unions
of stochastic and deterministic subsets, while we only assume the stochastic model.
We recommend that interested readers read the paper for the details. However, only
considering their results on stochastic O and Ω, a direct comparison shows that the
number of samples we need is less than that in this reference. The difference is several
logarithmic factors. Actually, the requirement of ρ in our paper is optimal even for
clean data in the literature of MC. Finally, we want to emphasize that the random
support assumption is essential in Theorem 1.3 when the rank is large. Examples can
be found in [24].
We wish to close our introduction with a few words concerning the techniques of
proof we shall use. The proof of Theorem 1.1 is based on the concept of restricted
isometry, which is a standard technique in the literature of CS. However, our argu-
ment involves a generalization of the restricted isometry concept. The proofs of The-
orems 1.2 and 1.3 are based on the golfing scheme, an elegant technique pioneered
by David Gross [21] and later used in [6, 12, 31] to construct dual certificates. Our
proof leverages results from [12]. However, we contribute novel elements by finding
80 Constr Approx (2013) 37:73–99

an appropriate way to phrase sufficient optimality conditions, which are amenable to


the golfing scheme. Details are presented in the following sections.

2 A Proof of Theorem 1.1

In the proof of Theorem 1.1, we will see the notation PT x. Here x is a k-dimensional
vector and T is a subset of {1, . . . , k}. We also use T to represent the subspace of
all k-dimensional vectors supported on T . Then PT x is the projection of x onto the
subspace T , which is to keep the value of x on the support T and to change other
elements into zeros. In this section, we use the notation “ .” of “floor function” to
represent the integer part of any real number.
First we generalize the concept of the restricted isometry property (RIP) [8]:

Definition 2.1 For any matrix Φ ∈ Rl×(n+m) , define the RIP-constant δs1 ,s2 by the
infimum value of δ such that
  2
  x  
(1 − δ) x22 + f 22 ≤  
Φ f  ≤ (1 + δ) x2 + f 2
2 2
2

holds for any x ∈ Rn with | supp(x)| ≤ s1 and f ∈ Rm with | supp(f )| ≤ s2 .

Lemma 2.2 For any x1 , x2 ∈ Rn and f1 , f2 ∈ Rm such that supp(x1 ) ∩ supp(x2 ) = φ,


| supp(x1 )| + | supp(x2 )| ≤ s1 , and supp(f1 ) ∩ supp(f2 ) = φ, | supp(f1 )| + | supp(f2 )|
≤ s2 , we have
      
 
 Φ x1 , Φ x2  ≤ δs ,s x1 2 + f1 2 x2 2 + f2 2 .
 f1 f2  1 2 2 2 2 2

Proof First, we suppose x1 22 + f1 22 = x2 22 + f2 22 = 1. By the definition of
δs1 ,s2 , we have
    
x + x2 x + x2
2(1 − δs1 ,s2 ) ≤ Φ 1 ,Φ 1 ≤ 2(1 + δs1 ,s2 )
f1 + f2 f1 + f2
and
    
x1 − x2 x1 − x 2
2(1 − δs1 ,s2 ) ≤ Φ ,Φ ≤ 2(1 + δs1 ,s2 ).
f1 − f2 f1 − f2
  x   x 
By the above inequalities, we have  Φ f11 , Φ f22  ≤ δs1 ,s2 , and hence by homo-
  x   x   
geneity, we have  Φ f11 , Φ f22  ≤ δs1 ,s2 x1 22 + f1 22 x2 22 + f2 22 without
the norm assumption. 

Lemma 2.3 Suppose Φ ∈ Rl×(n+m) with RIP-constant δ2s1 ,2s2 < 181
(s1 , s2 > 0) and
 
1 s1 s1
λ is between 2 s2 and 2 s2 . Then for any x ∈ R with | supp(x)| ≤ s1 , any f ∈ Rm
n

with | supp(f )| ≤ s2 , and any w ∈ Rm with w2 ≤ , the solution


√ (x̂, fˆ) to the
4 13+13δ2s1 ,2s2
optimization problem (1.7) satisfies x̂ − x2 + fˆ − f 2 ≤ .
1−9δ2s1 ,2s2
Constr Approx (2013) 37:73–99 81

Proof Suppose x = x̂ − x and f = fˆ − f . Then by (1.7), we have


         
   
Φ x  ≤ w2 + Φ x̂ − Φ x + w  ≤ 2.
 f 2  fˆ f 
2

It is easy to check that the original (x, f ) satisfies the inequality constraint in (1.7),
so we have

x + x1 + λf + f 1 ≤ x1 + λf 1 . (2.1)



4 13+13δ2s ,2s
Then it suffices to show x2 + f 2 ≤ 1−9δ2s ,2s1 2 .
1 2
Suppose T0 with |T0 | = s1 such that supp(x) ∈ T0 . Denote T0c = T1 ∪ · · · ∪ Tl ,
where |T1 | = · · · = |Tl−1 | = s1 and |Tl | ≤ s1 . Moreover, suppose T1 contains the in-
dices of the s1 largest (in the sense of absolute value) coefficients of PT0c x, T2
contains the indices of the s1 largest coefficients of P(T0 ∪T1 )c x, and so on. Simi-
larly, define V0 such that supp(f ) ⊂ V0 and |V0 | = s2 , and divide V0c = V1 ∪ · · · ∪ Vk
in the same way. By this setup, we easily have
 −1
PTj x2 ≤ s1 2 PT0c x1 (2.2)
j ≥2

and
 −1
PVj f 2 ≤ s2 2 PV0c f 1 . (2.3)
j ≥2

On the other hand, by the assumption supp(x) ⊂ T0 and supp(f ) ⊂ V0 , we have

x + x1 = PT0 x + PT0 x1 + PT0c x1 ≥ x1 − PT0 x1 + PT0c x1 ,
(2.4)
and similarly,

f + f 1 ≥ f 1 − PV0 f 1 + PV0c f 1 . (2.5)

By inequalities (2.1), (2.4), and (2.5), we have

PT0c x1 + λPV0c f 1 ≤ PT0 x1 + λPV0 f 1 . (2.6)


  x 
By the definition of δ2s1 ,2s2 , the fact Φ f  ≤ 2, and Lemma 2.2, we have
2

(1 − δ2s1 ,2s2 ) PT0 x + PT1 x22 + PV0 f + PV1 f 22
  2
 PT0 x + PT1 x 
≤ Φ 
PV0 f + PV1 f 2
      
PT0 x + PT1 x x PT2 x + · · · + PTl x
= Φ ,Φ −Φ
PV0 f + PV1 f f PV2 f + · · · + PVk f
    
PT0 x + PT1 x PT2 x + · · · + PTl x
≤− Φ ,Φ
PV0 f + PV1 f PV2 f + · · · + PVk f
82 Constr Approx (2013) 37:73–99
  
 PT0 x + PT1 x 
+ 2   Φ 
PV0 f + PV1 f 2
     
 PT x    
≤ δ2s1 ,2s2  0  +  PT1 x  P x + P f 
 PV f   PV f  Tj 2 Vj 2
0 2 1 2 j ≥2 j ≥2

+ 2 1 + δ2s1 ,2s2 PT0 x22 + PT1 x22 + PV0 f 22 + PV1 f 22 .

Moreover, since
 
PTj x2 + PVj f 2
j ≥2 j ≥2

− 12 −1
≤ s1 PT0c x1 + s2 2 PV0c f 1 by (2.2) and (2.3),

− 12  1 s1
≤ 2s1 PT0c x1 + λPV0c f 1 by λ > ,
2 s2
− 12 
≤ 2s1 PT0 x1 + λPV0 f 1 by (2.6),
− 12 1 1 
≤ 2s1 s1 PT0 x2 + λs2 PV0 f 2
2 2
by Cauchy-Schwartz inequality,

s1
≤ 4PT0 x2 + 4PV0 f 2 , by λ < 2 ,
s2
we have
     
 PT x    
 0  +  PT1 x  PTj x2 + PVj f 2
 PV f   PV f 
0 2 1 2 j ≥2 j ≥2

≤ 8 PT0 x22 + PT1 x22 + PV0 f 22 + PV1 f 22 .

Therefore, by δ2s1 ,2s2 < 1/9, we have



2 1 + δ2s1 ,2s2
PT0 x22 + PT1 x22 + PV0 f 22 + PV1 f 22 ≤ .
1 − 9δ2s1 ,2s2

Since
 
PTj x2 + PVj f 2 ≤ 4PT0 x2 + 4PV0 f 2 ,
j ≥2 j ≥2

we have

x2 + f 2 ≤ 5 PT0 x2 + PV0 f 2 + (PT1 x2 + PV1 f 2 )
√ 
≤ 52 PT0 x22 + PT1 x22 + PV0 f 22 + PV1 f 22

4 13 + 13δ2s1 ,2s2
≤ .
1 − 9δ2s1 ,2s2 
Constr Approx (2013) 37:73–99 83

We now cite a well-known result in the literature of CS, e.g., Theorem 5.2 of [3].

Lemma 2.4 Suppose A is a random matrix defined in Model 1. Then for any 0 < δ <
1, there exist c1 (δ), c2 (δ) > 0 such that with probability at least 1 − 2 exp(−c2 (δ)m),

(1 − δ)x22 ≤ Ax22 ≤ (1 + δ)x22

holds universally for any x with | supp(x)| ≤ c1 (δ) log mn +1 .


m

Also, we cite a well-known result that can give a bound for the biggest singular
value of random matrix, e.g., [17] and [39].

Lemma 2.5 Let B be an m × n matrix whose entries are independent stan-


dard normal random variables. Then every t ≥ 0, with probability at least
√ for √
1 − 2 exp(−t 2 /2), one has B2,2 ≤ m + n + t.

We now prove Theorem 1.1.

Proof Suppose α, δ are two constants independent of m and n, whose values will
be specified later. Set s1 = α log mn +1  and s2 = αm. We want to bound the RIP-
m
constant δ2s1 ,2s2 for the (n + m) × m matrix Φ = [A, I ] when α is sufficiently small.
For any T with |T | = 2s1 and V with |V | = 2s2 , and any x with supp(x) ⊂ T and f
with supp(f ) ⊂ V , we have
  2
 
[A, I ] x  = Ax + f 2 = Ax2 + f 2 + 2PV APT x, f .
 f  2 2 2
2

By Lemma 2.4, assuming α ≤ c1 (δ) with probability at least 1 − 2 exp(−c2 (δ)m),

(1 − δ)x22 ≤ Ax22 ≤ (1 + δ)x22 (2.7)

holds universally for any such T and x.


Now we fix T and V , and we want to bound PV APT 2,2 . By Lemma 2.5, we
actually have
1   √
PV APT 2,2 ≤ √ 2s1 + 2s2 + δ 2 m ≤ (2 2α + δ) (2.8)
m

with probability at least 1 − 2 exp(−δ 2 m/2). Then with probability at least 1 −


2  
2 exp(− δ 2m ) 2sn1 2sm2 , (2.8) holds universally for any V satisfying |V | = 2s1 and
T satisfying |V | = 2s2 . By 2s1 ≤ 2α log mn +1 , we have 2s1 log( 2s
en
) ≤ α1 m, where α1
m n 1
only depends on α and α1 → 0 as α → 0, and hence 2s1 ≤ ( 2s1 ) ≤ exp(α1 m).
en 2s1

Similarly, because 2s2 ≤ 2αm, we have 2s2 log( 2s em


) ≤ α2 m, where α2 only depends
m 2
on α and α2 → 0 as α → 0, and hence 2s2 ≤ ( 2s2 ) ≤ exp(α2 m). Therefore, (2.8)
em 2s2

holds universally for any such T and V with probability at least 1 − 2 exp((δ 2 /2 −
α1 − α2 )m).
84 Constr Approx (2013) 37:73–99

Combined with (2.7), we have that



(1 − δ)x22 + f 22 − (2 2α + δ)x2 f 2
  2
 x  √
≤ [A, I ]  ≤ (1 + δ)x2 + f 2 + (2 2α + δ)x2 f 2
 2 2
f 2

holds universally for any such T , U , x, and f with probability at least 1 −


2 exp(−c2 (δ)m) − 2 exp((δ 2 /2 − α1 − α2 )m). By choosing an appropriate δ and
letting α be sufficiently small, we have δ2s1 ,2s2 < 1/9 with probability at least
1 − Ce−cm .
m
Moreover, under the assumption that α( log(n/m)+1 ) ≥ 1, we have s1 =
 
α( log(n/m)+1
m
) > 0, s2 = αm > 0, and 12 ss12 < √ 1 n < 2 ss12 . Then Theo-
log m +1
rem 1.1 is a direct corollary of Lemma 2.3. 

3 A Proof of Theorem 1.2

In this section, we will encounter several absolute constants. Instead of denoting them
by C1 , C2 , . . . , we just use C, i.e., the values of C change from line to line. Also, we
will use the phrase “with high probability” to mean with probability at least 1−Cn−c ,
where C > 0 is a numerical constant and c = 3, 4, or 5 depending on the context.
Here we will use a lot of notation to represent sub-matrices and sub-vectors. Sup-
pose A ∈ Rm×n , P ⊂ [m] := {1, . . . , m}, Q ⊂ [n], and i ∈ [n]. We denote by AP ,: the
sub-matrix of A with row indices contained in P , by A:,Q the sub-matrix of A with
column indices contained in Q, and by AP ,Q the sub-matrix of A with row indices
contained in P and column indices contained in Q. Moreover, we denote by AP ,i the
sub-matrix of A with row indices contained in P and column i, which is actually a
column vector.
The term “vector” means column vector in this section, and all row vectors are
denoted by an adjoint of a vector, such as a ∗ for a vector a. Suppose a is a vector and
T a subset of indices. Then we denote by aT the restriction of a on T , i.e., a vector
with all elements of a with indices in T . For any vector v, we use v{i} to denote the
i-th element of v.

3.1 Supporting Lemmas

To prove Theorem 1.2, we need some supporting lemmas. Because our model of
sensing matrix A is the same as in [6], we will cite some lemmas from it directly.

Lemma 3.1 (Lemma 2.1 of [6]) Suppose A is as defined in Model 2. Let T ⊂


[n] be a fixed set of cardinality s. Then for δ > 0, P(A∗:,T A:,T − I 2,2 ≥ δ) ≤
δ2 ∗
m
2s exp(− μs · 2(1+δ/3) ). In particular, A:,T A:,T − I 2,2 ≤ 2
1
with high probability,
provided s ≤ γ μ log ∗
n , and A:,T A:,T − I 2,2 ≤ 2 log n with
m √1 high probability, pro-
vided s ≤ γ m
, where γ is some absolute constant.
μ log2 n
Constr Approx (2013) 37:73–99 85

This lemma was proved in [6] by matrix Bernstein’s inequality, which was first
introduced by [2]. A deep generalization is given in [38].

Lemma 3.2 (Lemma 2.4 of [6]) Suppose A is as defined in Model 2. Fix T ⊂ [n]
with |T | = s and v ∈ Rs . Then A∗:,T c A:,T v∞ ≤ 201√s v2 with high probability,
provided s ≤ γ μ log
m
n , where γ is some absolute constant.

Lemma 3.3 (Lemma 2.5 of [6]) Suppose A is as defined in Model 2. Fix T ⊂ [n] with
|T | = s. Then maxi∈T c A∗:,T A:,i 2 ≤ 1 with high probability, provided s ≤ γ μ log
m
n,
where γ is some absolute constant.

3.2 A Proof of Theorem 1.2

In this part, we will give a complete proof of Theorem 1.2 with a powerful technique
called “golfing-scheme” introduced by David Gross in [21], and used later in [12]
and [6]. Under the assumption of Model 2, we additionally assume s ≤ α m 2 and
μ log n
mb ≤ β m μ , where α and β are numerical constants whose values will be specified
later. 
m
First we give two useful inequalities. By replacing A with m−m b
AB c ,T in
Lemma 3.1 and Lemma 3.2, we have
 
 m 
 A∗
A − I  ≤ 1/2 (3.1)
 m − m B ,T B ,T
c c

b 2,2

and
 
 m 
maxc 
 AB c ,T AB c ,i 

 ≤1 (3.2)
i∈T m − mb 2

with high probability, provided s ≤ γ m−m


μ log n . Since s ≤ α μ log2 n and mb ≤ β μ , both
b m m

(3.1) and (3.2) hold with high probability, provided α and β are sufficiently small.
We assume (3.1) and (3.2) hold throughout this section.
First we prove that the solution (x̂, fˆ) of (1.3) equals (x, f ) if we can find an
appropriate dual vector qB c satisfying the following requirement. This is actually an
“inexact dual vector” of the optimization problem (1.3). This idea was first given
explicitly in [22] and [21], and related to [4]. We give a result similar to [6].

Lemma 3.4 (Inexact Duality) Suppose there exists a vector qB c ∈ Rm−mb satisfying
 
vT − sgn(xT ) ≤ λ/4, vT c ∞ ≤ 1/4, and qB c ∞ ≤ λ/4, (3.3)
2

where
v = A∗B c ,: qB c + A∗B,: λsgn(fB ). (3.4)

Then the solution (x̂, fˆ) of (1.3) equals (x, f ) provided β is sufficiently small and
λ < 32 .
86 Constr Approx (2013) 37:73–99

Proof Set h = x̂ − x. By xT c = 0, we have

hT c = x̂T c . (3.5)

By fB c = 0 and Ax + f = Ax̂ + fˆ, we have Ah = f − fˆ and

AB c ,: h = (f − fˆ)B c = −fˆB c . (3.6)

Then we have the following inequality:

x̂1 + λfˆ1
    
= x̂T , sgn(x̂T ) + x̂T c 1 + λ fˆB , sgn(fˆB ) + fˆB c 1
    
≥ x̂T , sgn(xT ) + x̂T c 1 + λ fˆB , sgn(fB ) + fˆB c 1
    
= xT + hT , sgn(xT ) + hT c 1 + λ fB − AB,: h, sgn(fB ) + AB c ,: h1
by (3.5), (3.6)
   
= x1 + λf 1 + hT c 1 + λAB c ,: h1 + hT , sgn(xT ) − λ AB,: h, sgn(fB ) .

Since x̂1 + λfˆ1 ≤ x1 + λf 1 , we have


   
hT c 1 + λAB c ,: h1 + hT , sgn(xT ) − λ AB,: h, sgn(fB ) ≤ 0. (3.7)

By (3.4), we have
 
hT , vT  + hT c , vT c  = h, v = h, A∗B c ,: qB c + A∗B,: λsgnfB
= AB c ,: h, qB c  + λAB,: h, sgnfB ,

and then by (3.3),


     
hT , sgn(xT ) − λ AB,: h, sgn(fB ) = hT , sgn(xT ) − vT + AB c ,: h, qB c 
− hT c , vT c 
λ 1 1
≥ − hT 2 − λAB c ,: h1 − hT c 1 .
4 4 4

Combining this with (3.7), we have

λ 3 3
− hT 2 + λAB c ,: h1 + hT c 1 ≤ 0. (3.8)
4 4 4
 
By (3.1), we have  m−m
m ∗  ≤ 3
b
AB c ,T 2,2 2 , and the smallest singular value of
m ∗ is at least 12 . Therefore,
m−mb AB c ,T AB ,T
c
Constr Approx (2013) 37:73–99 87
 
 m 
hT 2 ≤ 2
 A∗B c ,T AB c ,T hT 

m − mb 2
    
 m   m 
≤2  A ∗
 m − m B ,T B ,T T 
c A c c h c  +  A∗
c A
 m − m B ,T B ,: 
c h 
b 2 b 2
   
 m  √  m 
≤ 2 ∗  
 m − m AB c ,T AB c ,T c hT c  + 6 m − m AB c ,: h

b 2 b 2
   
 m  √  m 
≤2  ∗   
 m − m AB c ,T AB c ,i  |h{i} | + 6 m − m AB c ,: h ,
c
i∈T b 2 b 2

by the triangle inequality, and


 
√  m 
≤ 2hT 1 + 6  AB ,: h by (3.2).
c
m − mb
c

1
√ 
Plugging this into (3.8), we have( 34 − 12 λ)hT c 1 + ( 34 − 46 m−m
m
)λAB c ,: h1 ≤ 0.
√ 
b

We know 34 − 46 m−m m
b
> 0 when β is sufficiently small. Moreover, by the assump-
tion λ < 32 , we have hT c = 0 and AB c ,: h = 0. Since AB c ,: h = AB c ,T hT + AB c ,T c hT c ,
we have AB c ,T hT = 0. The inequality (3.1) implies that AB c ,T is injective, so hT = 0
and h = hT + hT c = 0, which implies (x̂, fˆ) = (x, f ). 

Now let’s construct a vector qB c satisfying the requirement (3.3) by choosing an


appropriate λ.

Proof of Theorem 1.2 Set λ = √log 1


n
. It suffices to construct a qB c satisfying (3.3).

Denoting u = AB c .: qB c , we only need to construct a qB c satisfying
  1
uT + λA∗ sgn(fB ) − sgn(xT ) ≤ λ , uT c ∞ ≤ ,
B,T 2 4 8
 ∗ 
λA sgn(fB ) ≤ 1 , λ
qB c ∞ ≤ .
B,: ∞ 8 4
Now let’s construct our qB c by the golfing scheme. First we have to write AB c ,:
as a block matrix. We divide B c into l = log2 n + 1 = log n
log 2 + 1 disjoint subsets:

B c = G1 ∪ · · · ∪ Gl , where |Gi | = mi . Then we have li=1 mi = m − mb and
⎡ ⎤
AG1 ,:
AB c ,: = ⎣ · · · ⎦ .
AGl ,:

We want to mention that the partition of B c is deterministic, not depending on A, so


AG1 ,: , . . . , AGl ,: are independent. Noticing mb ≤ β m
μ ≤ βm, by letting β sufficiently
small, we can require
m m m
≤ C, ≤ C, ≤ C log n for k = 3, . . . , l
m1 m2 mk
88 Constr Approx (2013) 37:73–99

for some absolute constant C. Since s ≤ α m


, we have
μ log2 n

m1 m2 mk
s ≤ αC , s ≤ αC , s ≤ αC for k = 3, . . . , l.
μ log2 n μ log2 n μ log n
 (3.9)
Then by Lemma 3.1, replacing A with mmj AGj ,T , we have the following inequali-
ties:
 
m ∗  1
 A A − I  ≤ √ for j = 1, 2; (3.10)
 m Gj ,T Gj ,T  2 log n
j 2,2
 
m ∗  1
 A 
 m Gj ,T AGj ,T − I  ≤ 2
for j = 3, . . . , l; (3.11)
j 2,2

with high probability, provided α is sufficiently small.


Now let’s give an explicit construction of qB c . Define
p0 = sgn(xT ) − λA∗B,T sgn(fB ) (3.12)

and
 
m ∗
pi = I − AGi ,T AGi ,T pi−1
mi
   
m ∗ m ∗
= I− A AGi ,T · · · I − A AG1 ,T p0 (3.13)
mi Gi ,T m1 G1 ,T
for i = 1, . . . , l, and construct
⎡ m ⎤
m1 AG1 ,T p0
⎢ .. ⎥
qB c = ⎣ . ⎦. (3.14)
m
ml AGl ,T pl−1

Then by u = A∗B c ,: qB c , we have


⎡ m ⎤
m1 AG1 ,T p0
⎥  m ∗
l
⎢ ..
u = A∗B c ,: ⎣ . ⎦ = A AG ,T pi−1 . (3.15)
m
mi Gi ,: i
i=1
ml AGl ,T pl−1

We now bound the 2 norm of pi . Actually, by (3.10), (3.11), and (3.13), we have
1
p1 2 ≤ √ p0 2 , (3.16)
2 log n
1
p2 2 ≤ p0 2 , (3.17)
4 log n
 j
1 1
pj 2 ≤ p0 2 for j = 3, . . . , l. (3.18)
log n 2
Now we will prove our constructed qB c satisfies the desired requirements.
Constr Approx (2013) 37:73–99 89

The Proof of λA∗B,: sgn(fB )∞ ≤ 1


8 By Hoeffding’s inequality, for any i =
2
1, . . . , n, we have P(|A∗B,i sgn(fB )|
≥ t) ≤ 2 exp(− 2t 2 ). By choosing t =
4AB,i 2

C log nAB,i 2 (C is some absolute constant),
 with high probability, we have
∗ √ μmb √
|λAB,i sgn(fB )| ≤ λC log nAB,i 2 ≤ C m ≤ β ≤ 18 , provided β is suffi-
ciently small, and this implies λA∗B,: sgn(fB )∞ ≤ 18 .

The Proof of uT + λA∗B,T sgn(fB ) − sgn(xT )2 ≤ λ4 By (3.15) and (3.13), we
 
have uT = li=1 mmi A∗Gi ,T AGi ,T pi−1 = li=1 (pi−1 − pi ) = p0 − pl . Then by
(3.12), we have uT + λA∗B,T sgn(fB ) − sgn(xT )2 = uT − p0 2 = pl 2 . Since

λA∗B,: sgn(fB )∞ ≤ 1/8, we have λA∗B,T sgn(fB )2 ≤ 18 s, which implies

  9√
p0 2 = λA∗B,T sgn(fB ) − sgn(xT )2 ≤ s. (3.19)
8

Then by (3.18) and l = log2 n + 1, we have pl 2 ≤ log1 n ( 12 )l 98 s ≤ ( log1 n )( n1 )( 98 ) ·

2 ≤ 4 log n = 4 , provided α is sufficiently small.
αm √1 λ
μ log n


The Proof of uT c ∞ ≤ 1/8 By (3.15), we have uT c = li=1 mmi A∗Gi ,T c AGi ,T pi−1 .
Recall that AG1 ,: , . . . , AGl ,: are independent, so by the construction
 of pi−1 we
m
know AGi ,: and pi−1 are independent. Replacing A with mi AGi ,: in Lemma

3.2, and by the sparsity condition (3.9), we have li=1  mmi A∗Gi ,T c AGi ,T pi−1 ∞ ≤
l
i=1 20 s pi−1 2 with high probability, provided α is sufficiently small. By (3.16),
1 √1

(3.17), (3.18), and (3.19), we have uT c ∞ ≤ li=1 20 1 √1
s
pi−1 2 ≤ 20
1 √1
s
2p0 2
< 18 .


  ak1
The Proof of qB c ∞ ≤ For k = 1, . . . , l, we denote AGk ,: =
λ
4
√1
, and
m a∗
...

 ã1∗  k km

AB,: = √1m ...



. By (3.13), (3.14), and (3.12), it suffices to show that for any
ãm
b
1 ≤ k ≤ l and 1 ≤ j ≤ mk ,
√    
 m m m ∗
 ∗ ∗
 m (akj )T I − m AGk−1 ,T AGk−1 ,T · · · I − A
m1 G1 ,T
AG1 ,T
k k−1

 λ
× sgn(xT ) − λA∗B,T sgn(fB )  ≤ .
4

Set
   
m ∗ m ∗
w= I − A AG1 ,T ··· I − A AGk−1 ,T (akj )T . (3.20)
m1 G1 ,T mk−1 Gk−1 ,T
90 Constr Approx (2013) 37:73–99

Then it suffices to prove


√ 
 m ∗  λ
 w sgn(x ) − λA∗
sgn(f ) 
B ≤ .
m T B,T
4
k

Since w and sgn(xT ) are independent, by Hoeffding’s inequality and conditioning


2
on w, we have P(|w ∗ sgn(xT )| ≥ t) ≤ 2 exp(− 2t 2 ) for any t > 0. Then with high
4w2
probability, we have
 ∗ 
w sgn(xT ) ≤ C log nw2 (3.21)
for some absolute constant C. mb
Setting z = sgn(fB ), we have w ∗ A∗B,T sgn(fB ) = √1m i=1 [(ãi )∗T w]z{i} . Since w,
AB,T and z are independent, by conditioning on w we have
$  % $ % $ %
E (ãi )∗T w z{i} = E (ãi )∗T w E z(i) = 0,
 
   
 (ãi )∗ w z{i}  ≤ w2 (ãi )T  ≤ √sμw2 ≤ αm
w2 ,
T 2
log2 n

and
$  2 % $  % $ %
E  (ãi )∗T w z{i}  = E w ∗ (ãi )T (ãi )∗T w = w ∗ E (ãi )T (ãi )∗T w = w22 .

By Bernstein’s inequality, we have


   
 ∗ ∗  t t 2 /2
 
P w AB,T sgn(fB ) ≥ √ ≤ 2 exp −  .
m mb w22 + αm
2 w2 t/3
log n


By choosing some numerical constant C and t = C m log nw2 , we have
 ∗ ∗ 
w A sgn(fB ) ≤ C log nw2 (3.22)
B,T

with high probability, provided α is sufficiently small.


By (3.21) and (3.22), we have
√  √
 m ∗
 ∗

 m
 m w sgn(xT ) − λAB,T sgn(fB )  ≤ m C log nw2 (3.23)
k k

for some numerical constant C.



When k ≥ 3, by (3.20), (3.10), and (3.11), we have w2 ≤ ( 12 )k−1 log1 n μs ≤
√ √
αm
2 . Recalling mmk ≤ C log n, by (3.23), we have | mmk w ∗ (sgn(xT )−λA∗B,T sgn(fB ))|
log n

≤ C( mmk ) α(log n)−3/2 ≤ λ4 provided α is sufficiently small.


When k ≤ 2, by (3.20) and (3.10), we have w2 ≤ μs ≤ logαm n . Recalling mk ≤
m
√ √
C, by (3.23), we have | mmk w ∗ (sgn(xT ) − λA∗B,T sgn(fB ))| ≤ C( mmk ) α(log n)−1/2 ≤
λ
4 provided α is sufficiently small. 
Constr Approx (2013) 37:73–99 91

Here we would like to compare our golfing scheme with that in [6]. There are
mainly two differences. One is that we have an extra term λA∗B,: sgn(fB ) in the dual
vector. To obtain the inequality vT c ∞ ≤ 1/4, we propose to bound uT c ∞ and
λA∗B,: sgn(fB )∞ , respectively, and this will lead to the extra log factor compared
with [6]. Moreover, by using the golfing scheme to construct the dual vector, we need
to bound the term qB c ∞ , which is not necessary in [6]. This inevitably incurs the
random signs assumptions of the signal.

4 A Proof of Theorem 1.3

In this section, the capital letters X, Y , etc. represent matrices, and the symbols in
script font I, PT , etc. represent linear operators from a matrix space to a matrix
space. Moreover, for any Ω0 ⊂ [n] × [n], we have PΩ0 M to keep the entries of M on
the support Ω0 and to change other entries into zeros. For any n × n matrix A, denote
by AF , A, A∞ , and A∗ , respectively, the Frobenius norm, operator norm
(the largest singular value), the biggest magnitude of all elements, and the nuclear
norm(the sum of all singular values).
Similarly to Sect. 3, instead of denoting them as C1 , C2 , . . . , we just use C, whose
values change from line to line. Also, we will use the phrase “with high probability”
to mean with probability at least 1 − Cn−c , where C > 0 is a numerical constant and
c = 3, 4, or 5 depending on the context.

4.1 A Model Equivalent to Model 3.1

Model 3.1 is natural and used in [12], but we will use the following equivalent model
for convenience:

Model 3.2
1. Fix an n by n matrix K, whose entries are either 1 or −1.
2. Define two independent random subsets of [n] × [n]: Γ  ∼ Ber((1 − 2s)ρ) and
Ω  ∼ Ber( 1−ρ+2sρ
2sρ
). Moreover, let O := Γ  ∪ Ω  , which thus satisfies O ∼
Ber(ρ).
3. Define an n × n random matrix W with independent entries Wij satisfying
P(Wij = 1) = P(Wij = −1) = 12 .
4. Define Ω  ⊂ Ω  : Ω  := {(i, j ) : (i, j ) ∈ Ω  , Wij = Kij }.
5. Define Ω := Ω  /Γ  , and Γ := O/Ω.
6. Let S satisfy sgn(S) := PΩ (K).

Obviously, in both Model 3.1 and Model 3.2, the whole setting is determinis-
tic if we fix (O, Ω). Therefore, the probability of (L̂, Ŝ) = (L, S) is determined
by the joint distribution of (O, Ω). It is not difficult to prove that the joint dis-
tributions of (O, Ω) in both models are the same. Indeed, in Model 3.1, we have
that (1{(i,j )∈O} , 1{(i,j )∈Ω} ) are iid random vectors with the probability distribu-
tion P(1{(i,j )∈O} = 1) = ρ, P(1{(i,j )∈Ω} = 1|1{(i,j )∈O} = 1) = s and P(1{(i,j )∈Ω} =
92 Constr Approx (2013) 37:73–99

1|1{(i,j )∈O} = 0) = 0. In Model 3.2, we have

(1{(i,j )∈O} , 1{(i,j )∈Ω} )



= max(1{(i,j )∈Γ  } , 1{(i,j )∈Ω  } ), 1{(i,j )∈Ω  } 1{Wi,j =Ki,j } 1{(i,j )∈Γ c } .

This implies that (1{(i,j )∈O} , 1{(i,j )∈Ω} ) are independent random vectors. More-
over, it is easy to calculate that P(1{(i,j )∈O} = 1) = ρ, P(1{(i,j )∈Ω} = 1) = sρ and
P(1{(i,j )∈Ω} = 1, 1{(i,j )∈O} = 0) = 0. Then we have

P(1{(i,j )∈Ω} = 1|1{(i,j )∈O} = 1) = P(1{(i,j )∈Ω} = 1, 1{(i,j )∈O} = 1)/ P(1{(i,j )∈O} = 1)
=s

and

P(1{(i,j )∈Ω} = 1|1{(i,j )∈O} = 0) = P(1{(i,j )∈Ω} = 1, 1{(i,j )∈O} = 0)/ P(1{(i,j )∈O} = 0)
= 0.

Notice that although (1{(i,j )∈O} , 1{(i,j )∈Ω} ) depends on K, its distribution does not.
By the above, we know that (O, Ω) has the same distribution in both models. There-
fore in the following we will use Model 3.2 instead. The advantage of using Model 3.2
is that we can utilize Γ  , Ω  , W , etc. as auxiliaries.
In the next section, we prove some supporting lemmas which are useful for the
proof of the main theorem.

4.2 Supporting Lemmas

Define T := {U X ∗ + Y V ∗ , X, Y ∈ Rn×r } a subspace of Rn×n . Then the orthogonal


projectors PT and PT ⊥ in Rn×n satisfy PT X = U U ∗ X + XV V ∗ − U U ∗ XV V ∗ and
PT ⊥ X = (I − U U ∗ )X(I − V V ∗ ) for any X ∈ Rn×n . This means PT ⊥ X ≤ X for
any X. Recalling the incoherence conditions, for any i ∈ {1, . . . , n}, U U ∗ ei 2 ≤ μr
n ,

∗ μr ∗ 2μr
and V V ei  ≤ n , we have PT (ei ej )∞ ≤ n and PT (ei ej )F ≤
2 ∗ 2μr
n [7, 9].

Lemma 4.1 (Theorem 4.1 of [7]) Suppose Ω0 ∼ Ber(ρ0 ). Then with high probability,
PT − ρ0−1 PT PΩ0 PT  ≤ , provided that ρ0 ≥ C0  −2 μr log
n
n
for some numerical
constant C0 > 0.

The original idea of the proof of this theorem is due to [34].

Lemma 4.2 (Theorem 3.1 of [12]) Suppose Z ∈ Range(PT ) is a fixed matrix,


Ω0 ∼ Ber(ρ0 ), and  ≤ 1 is an arbitrary constant. Then with high probability,
(I − ρ0−1 PT PΩ0 )Z∞ ≤ Z∞ provided that ρ0 ≥ C0  −2 μr log
n
n
for some nu-
merical constant C0 > 0.

Lemma 4.3 (Theorem 6.3 of [7]) Suppose Z is a fixed √ matrix and Ω0 ∼ Ber(ρ0 ).
Then with high probability, (ρ0 I − PΩ0 )Z ≤ C0 np log nZ∞ provided that
ρ0 ≤ p and p ≥ C0 logn n for some numerical constants C0 > 0 and C0 > 0.
Constr Approx (2013) 37:73–99 93

Notice that we only have ρ0 = p in Theorem 6.3 of [7]. By a very slight modifica-
tion in the proof (specifically, the proof of Lemma 6.2), we can have ρ0 ≤ p as stated
above.

4.3 A Proof of Theorem 1.3

By Lemma 3.1, we have  (1−2s)ρ


1
PT PΓ  PT − PT  ≤ 12 and  √(1−2s)ρ
1
PT PΓ   ≤

3/2 with high probability, provided Cρ is sufficiently large and Cs is sufficiently
small. We will assume both inequalities hold throughout the paper.

Theorem 4.4 If there exists an n × n matrix Y obeying




⎪ PT Y + PT (λPO/Γ  W − U V ∗ )F ≤ λ
,
⎨ n2
PT ⊥ Y + PT ⊥ (λPO/Γ  W ) ≤ 14 ,
(4.1)

⎪ P c Y = 0,
⎩ Γ
PΓ  Y ∞ ≤ λ4 ,

where λ = √ 1
nρ log n
, then the solution (L̂, Ŝ) to (1.6) satisfies (L̂, Ŝ) = (L, S).

Proof Set H = L̂ − L. The condition PO (L) + S = PO (L̂) + Ŝ implies that


PO (H ) = S − Ŝ. Then Ŝ is supported on O because S is supported on Ω ⊂ O.
By considering the subgradient of the nuclear norm at L, we have
 
L̂∗ ≥ L∗ + PT H, U V ∗ + PT ⊥ H ∗ .

By the definition of (L̂, Ŝ), we have

L̂∗ + λŜ1 ≤ L∗ + λS1 .

By the two inequalities above, we have


 
λS1 − λŜ1 ≥ PT (H ), U V ∗ + PT ⊥ H ∗ ,

which implies
       
λS1 − λPO/Γ  (Ŝ)1 ≥ H, U V ∗ + PT ⊥ (H )∗ + λPΓ  (Ŝ)1 .

On the other hand,


 
PO/Γ  Ŝ1 = S + PO/Γ  (−H )1
   
≥ S1 + sgn(S), PΩ (−H ) + PO/(Γ  ∪Ω) (−H )1
 
≥ S1 + PO/Γ  (W ), −H .

By the two inequalities above and the fact PΓ  Ŝ = PΓ  (Ŝ − S) = −PΓ  H , we have
     
P ⊥ (H ) + λPΓ  (H ) ≤ H, λPO/Γ  (W ) − U V ∗ . (4.2)
T ∗ 1
94 Constr Approx (2013) 37:73–99

By the assumptions of Y , we have


 
H, λPO/Γ  (W ) − U V ∗
 
= H, Y + λPO/Γ  (W ) − U V ∗ − H, Y 
   
= PT (H ), PT Y + λPO/Γ  (W ) − U V ∗ + PT ⊥ (H ), PT ⊥ Y + λPO/Γ  (W )
   
− PΓ  (H ), PΓ  (Y ) − PΓ c (H ), PΓ c (Y )
λ     
PT (H ) + 1 P ⊥ (H ) + λ PΓ  (H ) .
≤ ∗
n2 F 4 T 4 1

By inequality (4.2),
3     
P ⊥ (H ) + 3λ PΓ  (H ) ≤ λ PT (H ) .
T ∗ 1 2 F
(4.3)
4 4 n
Recall that we assume  (1−2s)ρ
1
PT PΓ  PT − PT  ≤ 12 and  √(1−2s)ρ1
PT PΓ   ≤

3/2 throughout the paper. Then
 
   1 
PT (H ) ≤ 2 
F  (1 − 2s)ρ PT PΓ  PT (H )
F
   
 1   1 

≤ 2 
PT PΓ  PT ⊥ (H ) + 2  PT PΓ  (H )
(1 − 2s)ρ (1 − 2s)ρ 
F F
* *
6 6
≤ P ⊥ H F + PΓ  H F .
(1 − 2s)ρ T (1 − 2s)ρ

By inequality (4.3), we have


 *   * 
3 λ 6   3λ λ 6
− 2 P ⊥ (H ) + − PΓ  H F ≤ 0.
T
4 n (1 − 2s)ρ F 4 n2 (1 − 2s)ρ

Then PT ⊥ (H ) = PΓ  H = 0, which implies PΓ  PT (H ) = 0. Since PΓ  PT is injective


1
( (1−2s)ρ PT PΓ  PT − PT  ≤ 12 ) on T , we have PT (H ) = 0. Then we have H = 0. 

 satisfying
Suppose we can construct Y and Y


⎪ PT Y + PT (λPΩ  W − U V ∗ )F ≤ λ
,
⎨ 2n2
PT ⊥ Y + PT ⊥ (λPΩ  W ) ≤ 14 ,
(4.4)

⎪ P c Y = 0,
⎩ Γ
PΓ  Y ∞ ≤ λ4 ,

and

 ∗
⎪ PT Y + PT (λ(2PΩ /Γ (W ) − PΩ W ) − U V )F ≤
λ
⎪   
2n2
,
⎨ 
PT ⊥ Y + PT ⊥ (λ(2PΩ  /Γ  (W ) − PΩ  W )) ≤ 14 ,
 = 0, (4.5)

⎪ PΓ c Y

∞ ≤ λ .
PΓ  Y 4
Constr Approx (2013) 37:73–99 95

Then Y = (Y + Ỹ )/2 will satisfy (4.1). By the assumptions in Model 2, (Γ  , PΩ  W )


and (Γ  , 2PΩ  /Γ  (W ) − PΩ  W ) have the same distribution. Therefore, if we can con-
struct Y satisfying (4.4) with high probability, we can also construct Y  satisfying
(4.5) with high probability. Therefore to prove Theorem 1.3, we only need to prove
that there exists Y satisfying (4.4) with high probability.

Proof of Theorem 1.3 Notice that Γ  ∼ Ber((1 − 2s)ρ). Suppose that q satisfies
1 − (1 − 2s)ρ = (1 − (1−2s)ρ 6 )2 (1 − q)l−2 , where l = 5 log n + 1. This implies
that q ≥ Cρ/ log(n). Define q1 = q2 = (1 − 2s)ρ/6 and q3 = · · · = ql = q. Then in
distribution, we can let Γ  = Γ1 ∪ · · · ∪ Γl , where Γj ∼ Ber(qj ) independently.
Construct
⎧ ∗
⎨ Z0 = PT (U V 1 − λPΩ  W ),

Zj = (PT − qj PT PΓj PT )Zj −1 for j = 1, . . . , j0 ,
⎩ Y = l
⎪ 1
PΓ Zj −1 .
j =1 qj j

Then by Lemma 4.1, we have


1
Zj F ≤ Zj −1 F for j = 1, . . . , l
2
with high probability, provided Cρ is large enough and Cs is small enough. Then
Zj F ≤ ( 12 )j Z0 F . By the construction of Zj , we know that Zj ∈ Range(PT ) and
Zj = (I − q1 PT PΓj )Zj −1 . Then similarly, by Lemma 4.2, we have

1
Z1 ∞ ≤ √ Z0 ∞
2 log n
and
1
Zj ∞ ≤ Z0 ∞ for j = 2, . . . , l
2j log n
with high probability, provided Cρ is large enough and Cs is small enough. Also, by
Lemma 4.3, we have
   *
 
1
 I − PΓ Zj −1  ≤ C n log n Zj −1 ∞ for j = 1, . . . , l
 q j  q

with high probability, provided Cρ is large enough and Cs is small enough.


We first bound Z0 F and Z0 ∞ . Obviously Z0 ∞ ≤ U V ∗ ∞ +
λPT PΩ  (W )∞ . Recall that for any i, j ∈ [n], we have PT (ei ej∗ )∞ ≤ 2μr
n and

PT (ei ej∗ )F ≤ 2μrn . Moreover, the entries of PΩ (W ) are iid random variables


with the distribution


⎧ sρ

⎪ 1 with probability 1−ρ+2sρ ,
 ⎨
1−ρ
PΩ  (W ) ij = 0 with probability 1−ρ+2sρ ,


⎩ −1 with probability sρ
. 1−ρ+2sρ
96 Constr Approx (2013) 37:73–99

Then by Bernstein’s inequality, we have


      
P  PT PΩ  (W ) , ei ej∗  ≥ t = P  PΩ  (W ), PT ei ej∗  ≥ t
 
t 2 /2
≤ 2 exp −  ,
EXj2 + Mt/3

where we have
 2sρ  
EXj2 = PT ei e∗ 2 ≤ Cρs μr
1 − ρ + 2sρ j F
n
and
  2μr
M = PT ei ej∗ ∞ ≤ .
n

Then with high probability, we have PT PΩ  (W )∞ ≤ C ρ μr log n
 √
n
2 n μr log n
> C Cρ M log n). Then by U V ∗ ∞ ≤ n , we have
μr
(≥ C Cρ μr log n n

μr √
Z0 ∞ ≤ C n , which implies Z0 F ≤ nZ0 ∞ ≤ C μr .
Now we want to prove Y satisfies (4.4) with high probability. Obviously, PΓ c Y =
0. It suffices to prove


⎪ PT Y + PT (λPΩ  (W ) − U V ∗ )F ≤ λ
2n2
,

PT ⊥ Y  ≤ 18 ,
(4.6)
⎪ PT ⊥ (λPΩ  (W )) ≤ 18 ,


PΓ  Y ∞ ≤ λ4 .

First,
 
PT Y + PT λPΩ  (W ) − U V ∗ 
F
 + l ,
  1 
 
= Z0 − PT PΓj Zj −1 
 qj 
j =1 F
 + l ,
  1 
 
= PT Z0 − PT PΓj PT Zj −1 
 qj 
j =1 F
  +j ,
  0 
 1 1 
=  PT − PT PΓ1 PT Z0 − PT PΓj PT Zj −1 
 q1 qj 
j =2 F
 + l ,
  1 
 
= PT Z1 − PT PΓj PT Zj −1 
 qj 
j =2 F
 l
1 √ λ
= · · · = Zl F ≤C μr ≤ 2 .
2 n
Constr Approx (2013) 37:73–99 97

Second,
 
 
l 
 1 
PT ⊥ Y  = PT ⊥ PΓj Zj −1 
 qj 
j =1

l 


1 
≤  
 q PT ⊥ PΓj Zj −1 
j
j =1

l 
  
 
= P ⊥ 1 PΓ Zj −1 − Zj −1 
 T q j 
j
j =1

l 
 
1 
≤  PΓ Zj −1 − Zj −1 
q j 
j
j =1
*

l
n log n
≤ C Zj −1 ∞
qj
j =1
+ l ,
 1 1 1
≤ C n log n √ + √ √ +√ Z0 ∞
2j −1 log n qj 2 log n q2 q1
j =3

nμr log n 1
≤C √ ≤ √ ,
n ρ 8 log n
provided Cρ is sufficiently large.
Third, we have λPT ⊥ PΩ  (W ) ≤ λPΩ  (W ). Notice that Wij is an independent
Rademacher sequence independent of Ω  . By Lemma 4.3, we have
 
 2sρ 
 W − P  (W ) ≤ C  np log nW ∞
 1 − ρ + 2sρ Ω  0

2sρ
with high probability, provided 1−ρ+2sρ ≤ p and p ≥ C0 logn n . By Theorem 3.9 of

[39], we have W ∞ ≤ C1 n with high probability. Therefore,
  √ 2sρ
PΩ  (W ) ≤ C  np log n + C1 n .
0
1 − ρ + 2sρ

nρ log n
By choosing p = Cρ2 for some appropriate C2 , we have PΩ  (W ) ≤ 8 , pro-
vided Cρ is large enough and Cs is small enough.
Fourth,
 
  1 
PΓ Y ∞ =  P
 Γ P Z 
Γj j −1 
qj ∞
j
 1
≤ Zj −1 ∞
qj
j
98 Constr Approx (2013) 37:73–99
+ l ,
 1 1 1 1 1
≤ + √ + Z0 ∞
qj 2j −1 log n q2 2 log n q1
j =3

μr λ
≤C ≤ √ ,
nρ 4 log n
provided Cρ is sufficiently large. 

Notice that in [12] the authors used a very similar golfing scheme. To compare
these two methods, we use here a golfing scheme of nonuniform sizes to achieve a
result with fewer log factors. Moreover, unlike in [12], where the authors used both
the golfing scheme and the least square method to construct two parts of the dual ma-
trix, here we only use the golfing scheme. Actually, the method to construct the dual
matrix in [12] cannot be applied directly to our problem when ρ = O(r log2 n/n).

Acknowledgements I am grateful to my Ph.D. advisor, Emmanuel Candès, for his encouragements and
his help in preparing this manuscript.

References

1. Agarwal, A., Negahban, S., Wainwright, M.: Noisy matrix decomposition via convex relaxation: opti-
mal rates in high dimensions. In: Proc. 28th Inter. Conf. Mach. Learn. (ICML), pp. 1129–1136 (2011)
2. Ahlswede, R., Winter, A.: Strong converse for identification via quantum channels. IEEE Trans. Inf.
Theory 48(3), 569–579 (2002)
3. Baraniuk, R., Davenport, M., DeVore, R., Wakin, M.: A simple proof of the restricted isometry prop-
erty for random matrices. Constr. Approx. 28(3), 253–263 (2008)
4. Candès, E., Plan, Y.: Matrix completion with noise. In: Proceedings of the IEEE (2009)
5. Candès, E., Plan, Y.: Near-ideal model selection by 1 minimization. Ann. Stat. 37(5A), 2145–2177
(2009)
6. Candès, E., Plan, Y.: A probabilistic and RIPless theory of compressed sensing. IEEE Trans. Inf.
Theory 57(11), 7235–7254 (2011)
7. Candès, E., Recht, B.: Exact matrix completion via convex optimization. Found. Comput. Math. 9(6)
(2009)
8. Candès, E., Tao, T.: Decoding by linear programming. IEEE Trans. Inf. Theory 51(12) (2005)
9. Candès, E., Tao, T.: The power of convex relaxation: near-optimal matrix completion. IEEE Trans.
Inf. Theory 56(5), 2053–2080 (2010)
10. Candès, E., Romberg, J., Tao, T.: Robust uncertainty principles: exact signal reconstruction from
highly incomplete frequency information. IEEE Trans. Inf. Theory 52(2), 489–509 (2006)
11. Candès, E., Romberg, J., Tao, T.: Stable signal recovery from incomplete and inaccurate measure-
ments. Commun. Pure Appl. Math. 59(8), 1207–1223 (2006)
12. Candès, E., Li, X., Ma, Y., Wright, J.: Robust principal component analysis? J. ACM 58(3) (2011)
13. Chandrasekaran, V., Sanghavi, S., Parrilo, P., Willsky, A.: Sparse and low-rank matrix decomposi-
tions. In: 15th IFAC Symposium on System Identification (SYSID) (2009)
14. Chandrasekaran, V., Sanghavi, S., Parrilo, P., Willsky, A.: Rank-sparsity incoherence for matrix de-
composition. SIAM J. Optim. 21(2), 572–596 (2011)
15. Chen, S., Donoho, D., Saunders, M.: Atomic decomposition by basis pursuit. SIAM J. Sci. Comput.
20(1), 33–61 (1998)
16. Chen, Y., Jalali, A., Sanghavi, S., Caramanis, C.: Low-rank matrix recovery from errors and erasures.
ISIT (2011)
17. Davidson, K., Szarek, S.: Local operator theory, random matrices and Banach spaces. Handb. Geom.
Banach Spaces I(8), 317–366 (2001)
18. Donoho, D.: For most large underdetermined systems of linear equations the minimal l1-norm solu-
tion is also the sparsest solution. Commun. Pure Appl. Math. 59(6), 797–829 (2006)
Constr Approx (2013) 37:73–99 99

19. Donoho, D.: Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006)
20. Fazel, M.: Matrix rank minimization with applications. Ph.D. Thesis (2002)
21. Gross, D.: Recovering low-rank matrices from few coefficients in any basis. IEEE Trans. Inf. Theory
57(3), 1548–1566 (2011)
22. Gross, D., Liu, Y.-K., Flammia, S., Becker, S., Eisert, J.: Quantum state tomography via compressed
sensing. Phys. Rev. Lett. 105(15) (2010)
23. Haupt, J., Bajwa, W., Rabbat, M., Nowak, R.: Compressed sensing for networked data. IEEE Signal
Process. Mag. 25(2), 92–101 (2008)
24. Hsu, D., Kakade, S., Zhang, T.: Robust matrix decomposition with sparse corruptions. IEEE Trans.
Inf. Theory 57(11), 7221–7234 (2011)
25. Keshavan, R., Montanari, A., Oh, S.: Matrix completion from a few entries. IEEE Trans. Inf. Theory
56(6), 2980–2998 (2010)
26. Laska, J., Davenport, M., Baraniuk, R.: Exact signal recovery from sparsely corrupted measurements
through the pursuit of justice. In: Asilomar Conference on Signals Systems and Computers (2009)
27. Laska, J., Boufounos, P., Davenport, M., Baraniuk, R.: Democracy in action: quantization, saturation,
and compressive sensing. Appl. Comput. Harmon. Anal. 31(3), 429–443 (2011)
28. Li, Z., Wu, F., Wright, J.: On the systematic measurement matrix for compressed sensing in the pres-
ence of gross errors. In: Data Compression Conference, pp. 356–365 (2010)
29. Nguyen, N., Tran, T.: Exact recoverability from dense corrupted observations via l1 minimization.
Preprint (2011)
30. Ngyuen, N., Nasrabadi, N., Tran, T.: Robust lasso with missing and grossly corrupted observations.
Preprint (2011)
31. Recht, B.: A simpler approach to matrix completion. J. Mach. Learn. Res. 12, 3413–3430 (2011)
32. Recht, B., Fazel, M., Parillo, P.: Guaranteed minimum-rank solutions of linear matrix equations via
nuclear norm minimization. SIAM Rev. 52(3) (2010)
33. Romberg, J.: Compressive sensing by random convolution. SIAM J. Imaging Sci. 2(4), 1098–1128
(2009)
34. Rudelson, M.: Random vectors in the isotropic position. J. Funct. Anal. 164(1), 60–72 (1999)
35. Rudelson, M., Vershynin, R.: On sparse reconstruction from Fourier and Gaussian measurements.
Commun. Pure Appl. Math. 61(8), 1025–1045 (2008)
36. Studer, C., Kuppinger, P., Pope, G., Bölcskei, H.: Recovery of sparsely corrupted signals. Preprint
(2011)
37. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58(1), 267–288
(1996)
38. Tropp, J.: User-friendly tail bounds for sums of random matrices. Found. Comput. Math. (2011)
39. Vershynin, R.: Introduction to the non-asymptotic analysis of random matrices. In: Eldar, Y., Ku-
tyniok, G. (eds.) Compressed Sensing, Theory and Applications, pp. 210–268. Cambridge University
Press, Cambridge (2012), Chap. 5
40. Wright, J., Ma, Y.: Dense error correction via 1 -minimization. IEEE Trans. Inf. Theory 56(7), 3540–
3560 (2010)
41. Wright, J., Yang, A.Y., Ganesh, A., Sastry, S., Ma, Y.: Robust face recognition via sparse representa-
tion. IEEE Trans. Pattern Anal. Mach. Intell. 31(2), 210–227 (2009)
42. Wu, L., Ganesh, A., Shi, B., Matsushita, Y., Wang, Y., Ma, Y.: Robust photometric stereo via low-rank
matrix completion and recovery. In: Proceedings of the 10th Asian Conference on Computer Vision,
Part III (2010)
43. Xu, H., Caramanis, C., Sanghavi, S.: Robust PCA via outlier pursuit. In: Ad. Neural Infor. Proc. Sys.
(NIPS), pp. 2496–2504 (2010)

You might also like