0% found this document useful (0 votes)
138 views6 pages

KPSVD

Kronecker factorization approximates a matrix by the Kronecker product of several matrices of smaller sizes. This reduces the computational complexity of finding a solution. In a recommender system, customers mark ratings on goods and vendors collect the customer's ratings.

Uploaded by

Brian Krouse
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
138 views6 pages

KPSVD

Kronecker factorization approximates a matrix by the Kronecker product of several matrices of smaller sizes. This reduces the computational complexity of finding a solution. In a recommender system, customers mark ratings on goods and vendors collect the customer's ratings.

Uploaded by

Brian Krouse
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

A Scalable Spectral Relaxation Approach to Matrix Completion via Kronecker

Products
Hui Zhao

Jiuqiang Han

Naiyan Wang

Congfu Xu

Zhihua Zhang

Department of Automation, Xian Jiaotong University, Xian, 710049 China

College of Computer Science and Technology, Zhejiang University, Hangzhou, 310027 China
{zhaohui, jqhan}@mail.xjtu.edu.cn [email protected] {xucongfu, zhzhang}@zju.edu.cn
Abstract
In the existing methods for solving matrix completion, such
as singular value thresholding (SVT), soft-impute and xed
point continuation (FPCA) algorithms, it is typically re-
quired to repeatedly implement singular value decomposi-
tions (SVD) of matrices. When the size of the matrix in ques-
tion is large, the computational complexity of nding a solu-
tion is costly. To reduce this expensive computational com-
plexity, we apply Kronecker products to handle the matrix
completion problem. In particular, we propose using Kro-
necker factorization, which approximates a matrix by the
Kronecker product of several matrices of smaller sizes. We
introduce Kronecker factorization into the soft-impute frame-
work and devise an effective matrix completion algorithm.
Especially when the factorized matrices have about the same
sizes, the computational complexity of our algorithm is im-
proved substantially.
Introduction
The matrix completion problem (Cai, Cand` es, and Shen
2010; Cand` es and Recht 2008; Keshavan, Montanari, and
Oh 2009; Mazumder, Hastie, and Tibshirani 2010; Beck and
Teboulle 2009) has become increasingly popular, because
it occurs in many applications such as collaborative lter-
ing, image inpainting, predicting missing data in sensor net-
works, etc. The problem is to complete a data matrix from
a few observed entries. In a recommender system, for ex-
ample, customers mark ratings on goods and vendors then
collect the customers preferences to form a customer-good
matrix in which the known entries represent actual ratings.
In order to make efcient recommendations, the vendors try
to recover the missing entries to predict whether a certain
customer would like a certain good.
A typical assumption in the matrix completion problem
is that the data matrix in question is low rank or approxi-
mately low rank (Cand` es and Recht 2008). This assumption
is reasonable in many instances such as recommender sys-
tems. On one hand, only a few factors usually contribute to
an customers taste. On the other hand, the low rank struc-
ture suggests that customers can be viewed as a small num-
ber of groups and that the customers within each group have
similar taste.
Copyright c 2011, Association for the Advancement of Articial
Intelligence (www.aaai.org). All rights reserved.
Recently, it has been shown that matrix completion is
not as ill-posed as originally thought. Srebro, Alon, and
Jaakkola (2005) derived useful generalization error bounds
for predicting missing entries. Several authors have shown
that under certain assumptions on the proportion of the miss-
ing entries and locations, most low-rank matrices can be re-
covered exactly (Cand` es and Recht 2008; Cand` es and Tao
2009; Keshavan, Montanari, and Oh 2009).
The key idea of recovering a low-rank matrix is to solve
a so-called matrix rank minimization problem. However,
this problem is NP-hard. An efcient approach for solv-
ing the problem is to relax the matrix rank into the ma-
trix nuclear norm. This relaxation technique yields a con-
vex reconstruction minimization problem, which is tractably
solved. In particular, Cai, Cand` es, and Shen (2010) devised
a rst-order singular value thresholding (SVT) algorithm for
this minimization problem. Mazumder, Hastie, and Tibshi-
rani (2010) then considered a more general convex opti-
mization problem for reconstruction and developed a soft-
impute algorithm for solving their problem. Other solutions
to the convex relation problem include xed point contin-
uation (FPCA) and Bregman iterative methods (Ma, Gold-
farb, and Chen 2009), the augmented Lagrange multiplier
method (Lin et al. 2010; Cand` es et al. 2009), singular value
projection (Jain, Meka, and Dhillon 2010), accelerated prox-
imal gradient algorithm (Toh and Yun 2009), etc..
These methods require repeatedly computing singular
value decompositions (SVD). When the size of the matrix
in question is large, however, the computational burden is
prohibitive. Implementations typically employ a numerical
iterative approach to computing SVD such as the Lanczos
method, but this does not solve the scaling problem.
In this paper we propose a scalable convex relaxation
for the matrix completion problem. In particular, we use
a matrix approximation factorization via Kronecker prod-
ucts (Van Loan and Pitslanis 1993; Kolda and Bader 2009).
Under the nuclear norm relaxation framework, we formu-
late a set of convex optimization subproblems, each of which
is dened on a smaller-size matrix. Thus, the cost of com-
puting SVDs can be mitigated. This leads us to an effec-
tive algorithm for handling the matrix completion problem.
Compared with the algorithms which use numerical meth-
ods computing SVD, our algorithm is readily parallelized.
The paper is organized as follows. The next section re-
views recent developments on the matrix completion prob-
lem. Our completion method based on the Kronecker factor-
ization of a matrix is then presented, followed by our exper-
imental evaluations. Finally, we give some concluding re-
marks.
Problem Formulations
Consider an np real matrix X = [x
ij
] with missing entries,
let {1, . . . , n}{1, . . . , p} denote the indices of obser-
vation entries of X, and let

= {1, . . . , n}{1, . . . , p}\
be the indices of the missing entries. In order to complete the
missing entries, a typical approach is to dene an unknown
low-rank matrix Y = [y
ij
] R
np
and to formulate the
following optimization problem
min
Y
rank(Y)
s.t.

(i,j)
(x
ij
y
ij
)
2
,
(1)
where rank(Y) represents the rank of the matrix Yand
0 is a parameter controlling the tolerance in training error.
However, it is not tractable to reconstruct Y from the
rank minimization problem in (1), because it is in general
an NP-hard problem. Since the nuclear norm of Y is the
best convex approximation of the rank function rank(Y)
over the unit ball of matrices, a feasible approach relaxes the
rank minimization into a nuclear norm minimization prob-
lem (Cand` es and Tao 2009; Recht, Fazel, and Parrilo 2007).
Let Y

denote the nuclear norm of Y. We have


Y

=
r

i=1

i
(Y),
where r = min{n, p} and the
i
(Y) are the singular values
of Y. The nuclear norm minimization problem is dened by
min
Z
Y

s.t.

(i,j)
(x
ij
y
ij
)
2
.
(2)
Clearly, when = 0, Problem (2) reduces to
min ||Y||

s.t. y
ij
= x
ij
, (i, j) ,
(3)
which has been studied by Cai, Cand` es, and Shen (2010)
and Cand` es and Tao (2009). However, Mazumder, Hastie,
and Tibshirani (2010) argued that (3) is too rigid, possibly
resulting in overtting. In this paper we concentrate on the
case in which > 0.
The nuclear norm is an effective convex relaxation of the
rank function. Moreover, off-the-shelf algorithms such as
interior point methods can be used to solve problem (2).
However, they are not efcient if the scale of the matrix in
question is large. Equivalently, we can reformulate (2) in La-
grangian form as follows
min
Y
1
2

(i,j)
(x
ij
y
ij
)
2
+ Y

, (4)
where > 0. Let P

(X) be an np matrix such that its


(i, j)th entry is x
ij
if (i, j) and zero otherwise. We
write the problem (4) in matrix form:
min
Z
1
2
P

(X)P

(Y)
2
F
+ Y

, (5)
where A
F
represents the Frobenius norm of A = [a
ij
];
that is,
A
2
F
=

i,j
a
2
ij
= tr(AA
T
) =

2
i
(A).
Recently, Cai, Cand` es, and Shen (2010) proposed a novel
rst-order singular value thresholding (SVT) algorithm for
the problem (2). The SVT algorithm is based on a notion of
matrix shrinkage operator.
Denition 1 (Matrix Shrinkage Operator) Suppose that
the matrix Ais an np matrix of rank r. Let the condensed
SVD of A be A = UV
T
, where U (nr) and V (pr)
satisfy U
T
U = I
r
and V
T
V = I
r
, = diag(
1
, . . . ,
r
)
is the rr diagonal matrix with
1

r
> 0. For
any > 0, the matrix shrinkage operator S

() is dened
by S

(A) := U

V
T
with

= diag([
1
]
+
, . . . , [
r
]
+
).
Here I
r
denotes the r r identity matrix and [t]
+
=
max (t, 0).
Recently, using the matrix shrinkage operator, Mazumder,
Hastie, and Tibshirani (2010) devised a so-called soft-impute
algorithm for solving (5). The detailed procedure of the soft-
impute algorithm is given in Algorithm 1, which computes a
series of solutions to (5) with different step sizes (). As we
see, the algorithm requires an SVD computation of an np
matrix at every iteration. When both n and p are large, this
is computationally prohibitive.
Algorithm 1 The Soft-Impute Algorithm
1: Initialize Y
(old)
= 0, > 0, tolerance and create an
ordering set = {
1
, . . . ,
k
} in which
i

i+1
for
any i.
2: for every xed do
3: Compute M S

(Y
(old)
).
4: Compute Y
(new)
P

(X) +P

(M).
5: if
Y
(new)
Y
(old)

F
Y
(old)

F
< then
6: Assign Y

Y
(new)
, Y
(old)
Y
(new)
and
go to step 2.
7: else
8: Assign Y
(old)
Y
(new)
and go to step 3.
9: end if
10: end for
11: Output the sequence of solutions Y
1
, . . . , Y

k
.
Methodology
Before we present our approach, we give some notation and
denitions. For an n m matrix A = [a
ij
], let vec(A) =
(a
11
, . . . , a
n1
, a
12
, . . . , a
nm
)
T
be the nm 1 vector. In
addition, AB = [a
ij
B] represents the Kronecker prod-
uct of A and B, and K
nm
(nmnm) is the commuta-
tion matrix which transforms vec(A) into vec(A
T
) (i.e.
K
nm
vec(A) = vec(A
T
)). Please refer to (L utkepohl 1996;
Magnus and Neudecker 1999) for some properties related to
the Kronecker product and the commutation matrix.
Assume that Y
i
for i = 1, . . . , are n
i
p
i
matrices where
n =

s
i=1
n
i
and p =

s
i=1
p
i
. We consider the following
relaxation:
min
YiR
n
i
p
i
1
2
P

(X) P

(Y)
2
F
+
s

i=1

i
Y
i

, (6)
where Y = Y
1
Y
2
Y
s
. For i = 1, . . . , s, let
Y
i
= Y
1
Y
i1
Y
i+1
Y
s
,
which is (n/n
i
) (p/p
i
). We now treat Y
j
for j = i xed.
Lemma 1 Assume that X is fully observed. If the Y
j
for
j = i are xed and nonzero, then
h(Y
i
) =
1
2
XY
2
F
+
i
Y
i

is strictly convex in Y
i
.
Theorem 1 Assume that X is fully observed. If the Y
j
for
j = i are xed and nonzero, then the minimum of h(Y
i
) is
obtained when
Y
i
= S
i
(Z
i
),
where
Z
i
=
_
vec(Y
T
i
)
T
I
ni
__
I
n
n
i
K
T
ni,
p
p
i
_
X
i
is an n
i
p
i
matrix. Here X
i
is the (np/p
i
)p
i
matrix de-
ned by
vec(X

i
) = vec(R
i
X
T
Q
T
i
),
where Q
i
= I
i1
K
si,ni
(nn) with

0
=
0
= 1,
i
=
i

j=1
n
j
, and
i
=
i

j=1
n
s+1j
and R
i
= I
i1
K
si,pi
(pp) with

0
=
0
= 1,
i
=
i

j=1
p
j
, and
i
=
i

j=1
p
s+1j
,
for i = 1, . . . , s.
The proofs of Lemma 1 and Theorem 1 are given in Ap-
pendix . The theorem motivates us to employ an iterative
approach to solving the problem in (6).
In the remainder of the paper we consider the special case
of s = 2 for simplicity. Let Y
1
= A = [a
ij
] (n
1
p
1
) and
Y
2
= B = [b
ij
] (n
2
p
2
). Moreover, we partition Xinto
X =
_

_
X
11
X
12
X
1,p1
X
21
X
22
X
2,p1
.
.
.
.
.
.
.
.
.
.
.
.
X
n1,1
X
n1,2
X
n1,p1
_

_
, X
ij
R
n2p2
.
Using Matlab colon notation, we can write X
ij
= X((i
1)n
2
+ 1 : in
2
, (j 1)p
2
+ 1 : jp
2
) for i = 1, . . . , n
1
; j =
1, . . . , p
1
. Thus,
=
1
2
XAB
2
F
=
1
2
n1

i=1
p1

j=1
X
ij
a
ij
B
2
F
=
1
2
n1

i=1
p1

j=1
tr[(X
ij
a
ij
B)(X
ij
a
ij
B)
T
].
In order to use the matrix shrinkage operator, we now have
that the (i, j)th element of Z
1
is tr(X
T
ij
B), i.e., [Z
1
]
ij
=
tr(X
T
ij
B), and Z
2
=

n1
i=1

p1
j=1
a
ij
X
ij
. We defer the
derivations of Z
1
and Z
2
to the appendix.
According to Theorem 1 and Algorithm 1, we imme-
diately have an algorithm for solving the problem in (6),
which is summarized in Algorithm 2. We call it the Kro-
necker Product-Singular Value Thresholding (KP-SVT). We
now see that at every iteration the algorithm implements the
SVD of two matrices with sizes n
1
p
1
and n
2
p
2
, respec-
tively. Compared with Algorithm 1, the current algorithm is
computationally effective. In particular, if n
1
n
2


n
and p
1
p
2


p, our algorithm becomes quite effective.
Moreover, we can further speed up the algorithm by setting
s > 2, letting Y = Y
1
Y
2
Y
s
for s > 2.
Algorithm 2 Kronecker Product-Singular Value Threshold-
ing (KP-SVT)
1: Initialize Y = P

(X), A = 1
n1
1
T
p1
, B = 1
n2
1
T
p2
,
> 0, tolerance , maxStep = k, step = 1.
2: for step < maxStep do
3: step = step + 1.
4: Compute [A
(old)
]
ij
tr(Y
T
ij
B
(old)
) and
A
(new)
S

1
||B
old
||
2
F
(
A
(old)
||B
(old)
||
2
F
).
5: Compute B
(old)


n1
i=1

p1
j=1
[A
(new)
]
ij
Y
ij
and
B
(new)
S

2
||A
(new)
||
2
F
(
B
(old)
||A
new
||
2
F
).
6: Compute Y
(new)
P

(X) + P

(A
(new)

B
(new)
).
7: if
Y
(new)
Y
(old)

F
Y
(old)

F
< then
8:

Y = Y
(new)
.
9: break.
10: end if
11: end for
12: Output the solutions

Y.
Experiments
In this section we rst demonstrate the convergence of the
KP-SVT algorithm through experiments and discuss how
to choose the proper sizes of A and B. Next, we compare
our KP-SVT with the soft-impute algorithm (Mazumder,
Hastie, and Tibshirani 2010) and the conventional SVT al-
gorithm (Cai, Cand` es, and Shen 2010) both in terms of ac-
curacy and efciency. All experiments are implemented in
MATLABand all the reported results are obtained on a desk-
top computer with a 2.57 GHz CPU and 4 GB of memory.
Both toy data and real world datasets are used. In our sim-
ulations, we generate n p matrices X of rank q by taking
X = UV+ noise, where U (n q) and V (q p) are in-
dependent random matrices with i.i.d. uniform distribution
between 0 and 1, and the noise term is the zero-mean Gaus-
sian white noise. The set of observed entries, , is uniformly
sampled at random over the indices of the matrix. The eval-
uation criterion is based on the test error as follows:
test error =
||P

(X

Y)||
F
||P

(X)||
F
.
We also adopt the Movielens datasets
1
and the Jester joke
dataset
2
to evaluate the performance of the algorithm. The
root mean squared error (RMSE) over the probe set is used
as the evaluation criteria on these datasets. Five datasets are
used in this section, Movielens 100K contains 943 users and
1690 movies with 100,000 ratings (1-5), and Movielens 1M
contains 6,040 users and 4,000 movies with 1 million rat-
ings. Jester joke contains over 1.7 million continuous rat-
ings (-10.00 to +10.00) of 150 jokes from 63,974 users.
Toy
1
is a 10001000 matrix with 20% known entries whose
rank = 10, and Toy
2
is a 5000 5000 matrix with 10%
known entries whose rank = 20. For real world datasets,
80% ratings are used for training set and 20% ratings are
used for test set.
Convergence Analysis
We conduct the KP-SVT algorithm on toy datasets and rec-
ommendation datasets. Figure 1 depicts the iterative pro-
cesses of the algorithm on these datasets. From Figure 1,
we see that the algorithm converges after a relatively small
number of iterations.
The Choice of the Sizes of Aand B
The main idea of our KP-SVT algorithm is using ABin-
stead of Y to approximate the original matrix X. Following
the property of Kronecker Products, we have n = n
1
n
2
and p = p
1
p
2
. If we set n
1
= n and p
1
= p, our model de-
generates to the conventional low-rank approximation prob-
lem. Thus, it is very important how to choose the appropriate
values of n
1
and p
1
.
The main calculation bottleneck of SVT-based low-rank
approximation algorithms lies in the calculation of the SVD
decompositions of np matrices, while in KP-SVT we need
only to calculate the SVD decompositions of two sets of
smaller-size matrices. In order to choose an appropriate size,
we make use of the following criterion,
min |n
1
n
2
| +|p
1
p
2
|,
s.t. n = n
1
n
2
and p = p
1
p
2
.
From Table 1 we can see that if we set (n
1
, p
1
) and (n
2
, p
2
)
neither too large nor too small, the algorithm runs faster
(sometimes twice as fast) while it at same time obtains better
recovery accuracy.
Performance Comparison
In this section, we conduct a set of experiments on ve
different datasets to compare the soft-impute (Mazumder,
Hastie, and Tibshirani 2010) algorithm and the conventional
SVT (Cai, Cand` es, and Shen 2010) algorithm. Especially
for Movielens 1M dataset, we recongure the ratio between
training and test set to prove the robustness of KP-SVT.
1
Download from http://www.grouplens.org/node/73
2
Download from http://eigentaste.berkeley.edu/dataset/
1 2 3 4 5 6 7 8 9 10
0.25
0.3
0.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
Iterations
T
e
s
t

e
r
r
o
r


Toy
1
Toy
2
(a) On toy datasets
10 20 30 40 50 60 70 80 90 100
1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
Iterations
R
M
S
E


Movielens 100k
Movielens 1M
(b) On recommendation datasets
Figure 1: Convergence analysis of KP-SVT.
The size of the auxiliary matrices A and B is described
in the left of Table 2. First, lets examine the recovery
performance of these three algorithms. For toy datasets,
SVT obtains a slightly better result while the performance
of our KP-SVT is also acceptable. For real recommenda-
tion datasets, unfortunately, SVT does not converge. Note
that here we use the MATLAB implementation of the con-
ventional SVT doanloaded from the second authors web-
page (Cai, Cand` es, and Shen 2010). Note also that KP-SVT
yields better performance than the soft-impute algorithm on
most datasets. Turning to the computational times, KP-SVT
is generally 5 to 10 times faster than the competitors.
Concluding Remarks
In this paper we have proposed a fast spectral relax-
ation approach to the matrix completion problem by us-
ing Kronecker products. In particular, we have devised a
Kronecker-product-based algorithm under the soft-impute
framework (Mazumder, Hastie, and Tibshirani 2010). Our
empirical studies have shown that KP-SVT can substantially
Table 1: The choice of the sizes of Aand B.
M 100k (943 1690) M 1M (6040 4000)
(n
1
, p
1
) (n
2
, p
2
) RMSE Time (n
1
, p
1
) (n
2
, p
2
) RMSE Time
(23, 65 ) (41, 26) 1.128 13.8 (151, 80) (40, 50) 1.101 169.9
(41, 65) (23, 26) 1.130 14.3 (151, 10) (40, 400) 1.108 171.7
(41, 10) (23, 169) 1.138 14.9 (151, 400) (40, 10) 1.100 199.1
(23, 10) (41, 169) 1.152 16.0 (604, 400) (10, 10) 1.132 385.3
(41, 169) (23, 10) 1.130 16.4 (10, 10) (604, 400) 1.167 388.1
Table 2: Experimental results of the three algorithms correspond to different training sizes: err the test error; time the
corresponding computational time (s), NC No convergence.
Datasets
Sizes SVT Soft-impute KP-SVT
(n
1
, p
1
) (n
2
, p
2
) err time err time err time
Toy
1
(40, 40) (25, 25) 0.375 21.1 0.381 74.6 0.433 4.7
Toy
2
(100, 100) (50, 50) 0.206 237.7 0.522 1051.5 0.281 47.0
M 100K (23, 65) (41, 26) NC NC 1.229 156.0 1.128 13.8
M 1M 70% (151, 80) (40, 50) NC NC 1.513 305.5 1.114 169.4
M 1M 80% (151, 80) (40, 50) NC NC 1.141 387.9 1.101 169.9
M 1M 90% (151, 80) (40, 50) NC NC 1.382 314.5 1.096 171.8
Jester (1103, 15) (58, 10) NC NC 4.150 271.0 5.094 13.9
reduce computational cost while maintaining high recovery
accuracy. The approach can also be applied to speed up other
matrix completion methods which are based on SVD, such
as FPCA (Ma, Goldfarb, and Chen 2009).
Although we have illustrated especially two-matrix prod-
ucts in this paper, Kronecker products can be applied re-
cursively when the size of the matrix in question is large.
Another possible proposal is to consider the following opti-
mization problem
min
Ai,Bi
1
2
P

(X)P

(Y)
2
F
+
s

i=1

1i
A
i

+
s

i=1

2i
B
i

,
where the A
i
and B
i
have appropriate sizes and Y =

s
i=1
A
i
B
i
. We will address matrix completion problems
using this proposal in future work.
Acknowledgments
This work is supported by the Natural Science Foun-
dations of China (No. 61070239), the 973 Program of
China (No. 2010CB327903), the Doctoral Program of
Specialized Research Fund of Chinese Universities (No.
20090101120066), and the Fundamental Research Funds for
the Central Universities.
Appendix: The Proofs of Lemma 1 and
Theorem 1
Consider that R
i
X
T
Q
T
i
is pn. We are always able to de-
ne an (np/p
i
)p
i
matrix X
i
such that
vec(X
T
i
) = vec(R
i
X
T
Q
T
i
).
Note that Q
i
Q
T
i
= I
n
, R
i
R
i
= I
p
and Q
i
(Y
1

Y
s
)R
T
i
= Y
i
Y
i
. Let =
1
2
XY
2
F
We have
d = tr
_
(XY)(Y
T
1
dY
T
i
Y
T
s
)
_
= tr
_
(XY)R
T
i
(Y
T
i
Y
T
i
)Q
i
_
= tr(Q
i
XR
T
i
(Y
T
i
dY
T
i
))
+ tr(Q
i
YR
T
i
(Y
T
i
dY
T
i
))
= tr(Q
i
XR
T
i
(Y
T
i
dY
T
i
))
+ tr((Y
T
i
Y
T
i
) (Y
i
dY
T
i
))
Using some matric algebraic calculations, we further have
d = vec(R
i
X
T
Q
T
i
)
T
vec(Y
T
i
dY
T
i
)
+ (

j=i
Y
j

2
F
)tr(Y
i
dY
T
i
)
= vec(X
T
i
)
T
[(I
n
n
i
K
ni,
p
p
i
)(vec(Y
T
i
)I
ni
) I
pi
]
vec(dY
T
i
) + (

j=i
Y
j

2
F
)tr(Y
i
dY
T
i
)
= tr
_
(vec(Y
T
i
)
T
I
ni
)(I
n
n
i
K
T
ni,
p
p
i
)X
i
dY
T
i

+ (

j=i
Y
j

2
F
)tr(Y
i
dY
T
i
)
= tr(Z
i
dY
T
i
) + (

j=i
Y
j

2
F
)tr(Y
i
dY
T
i
).
Here we use the fact that if A and B are mn and pq
matrices, then
vec(AB) = [((I
n
K
qm
)(vec(A) I
q
)) I
p
]vec(B).
Accordingly, we conclude that

2

vec(Y
T
)vec(Y
T
)
=
(

j=i
Y
j

2
F
)I
n1pi
. Thus, h(Y
i
) is strictly convex in Y
i
.
We now obtain that

Y
i
minimizes h if and only if 0 is a
subgradient of the function h at the point

Y
i
; that is,
0

Y
i

1

j=i
Y
j

2
F
Z
i
+

i

j=i
Y
j

2
F

Y
i

,
where

Y
i

is the set of subgradients of the nuclear norm.


We now consider the special case that s = 2. In this case,
we let Y
1
= A = [a
ij
] (n
1
p
1
) and Y
2
= B = [b
ij
]
(n
2
p
2
). Moreover, we partition Xinto
X =
_

_
X
11
X
12
X
1,p1
X
21
X
22
X
2,p1
.
.
.
.
.
.
.
.
.
.
.
.
X
n1,1
X
n1,2
X
n1,p1
_

_
, X
ij
R
n2p2
.
Using MATLAB colon notation, we can write X
ij
=
X((i 1)n
2
+ 1 : in
2
, (j 1)p
2
+ 1 : jp
2
) for i =
1, . . . , n
1
; j = 1, . . . , p
1
. Thus,
=
1
2
XAB
2
F
=
1
2
n1

i=1
p1

j=1
X
ij
a
ij
B
2
F
=
1
2
n1

i=1
p1

j=1
tr[(X
ij
a
ij
B)(X
ij
a
ij
B)
T
].
We have
d =
n1

i=1
p1

j=1
tr[B(X
ij
a
ij
B)
T
]da
ij

n1

i=1
p1

j=1
a
ij
tr[(X
ij
a
ij
B)dB
T
].
It then follows that

A
=
_

a
ij
_
= Z
1
+ tr(BB
T
)A,
where the (i, j)the element of Z
1
is tr(X
T
ij
B), i.e., [Z
1
]
ij
=
tr(X
T
ij
B), and that

A
= Z
2
+ tr(AA
T
)B
where Z
2
=

n1
i=1

p1
j=1
a
ij
X
ij
.
References
Beck, A., and Teboulle, M. 2009. A fast iterative shrinkage-
thresholding alghrithm for linear inverse problems. SIAM
Journal on Imaging Sciences 183202.
Cai, J.; Cand` es, E. J.; and Shen, Z. 2010. A singular value
thresholding algorithm for matrix completion. SIAM Jour-
nal on Optimization 20:19561982.
Cand` es, E. J., and Recht, B. 2008. Exact matrix completion
via convex optimization. Found. of Comput. Math. 9:717
772.
Cand` es, E. J., and Tao, T. 2009. The power of convex relax-
ation: Near-optimal matrix completion. IEEE Trans. Inform.
Theory (to appear).
Cand` es, E. J.; Li, X.; Ma, Y.; and Wright, J. 2009. Robust
principal component analysis. Microsoft Research Asia, Bei-
jing, China.
Jain, P.; Meka, R.; and Dhillon, I. 2010. Guaranteed rank
minimization via singular value projection. In Neural Infor-
mation Processing Systems(NIPS)24.
Keshavan, R.; Montanari, A.; and Oh, S. 2009. Matrix com-
pletion from a few entries. In Proceedings of International
Symposium on Information Theory (ISIT 2009).
Kolda, T. G., and Bader, B. W. 2009. Tensor decompositions
and applications. SIAM Review 51(3):455500.
Lin, Z.; Chen, M.; Wu, L.; and Ma, Y. 2010. The augmented
lagrange multiplier method for exact recovery of corrupted
low-rank matrices. Technical report, Electrical & Computer
Engineering Department, University of Illinois at Urbana-
Champaign, USA.
L utkepohl, H. 1996. Handbook of Matrices. New York:
John Wiley & Sons.
Ma, S.; Goldfarb, D.; and Chen, L. 2009. Fixed point and
Bregman iterative methods for matrix rank minimization.
Mathematical Programming Series A.
Magnus, J. R., and Neudecker, H. 1999. Matrix Calculus
with Applications in Statistics and Econometric. New York:
John Wiley & Sons, revised edition edition.
Mazumder, R.; Hastie, T.; and Tibshirani, R. 2010. Spec-
tral regularization algorithms for learning large incomplete
matrices. Journal of machine learning research 11(2):2287
2322.
Recht, B.; Fazel, M.; and Parrilo, P. A. 2007. Guaranteed
minimum-rank solutions of linear matrix equations via nu-
clear norm minimization. SIAM Review 52(3):471501.
Srebro, N.; Alon, N.; and Jaakkola, T. 2005. Generalization
error bounds for collaborative prediction with low-rank ma-
trices. Advances in Neural Information Processing Systems.
Toh, K., and Yun, S. 2009. An accelerated proximal gra-
dient algorithm for nuclear norm regularized least squares
problems.
Van Loan, C. F., and Pitslanis, N. 1993. Approximation
with kronecker products. In Moonen, M. S.; Golub, G. H.;
and de Moor, B. L. R., eds., Linear Algebra for Large Scale
and Real-Time Applications. Dordrecht: Kluwer Academic
Publisher. 293314.

You might also like