Tensor Completion Riemannian Optimization
Tensor Completion Riemannian Optimization
DOI 10.1007/s10543-013-0455-z
Received: 27 June 2013 / Accepted: 22 October 2013 / Published online: 7 November 2013
© Springer Science+Business Media Dordrecht 2013
B. Vandereycken
Department of Mathematics, Princeton University, Fine Hall, Princeton, NJ 08544, USA
e-mail: [email protected]
448 D. Kressner et al.
1 Introduction
This paper is concerned with low-rank completion for tensors in the sense of multi-
dimensional arrays. To be more specific, we aim to solve the tensor completion prob-
lem
1
min PΩ X − PΩ A2 (1.1)
X 2
subject to X ∈ Mr := X ∈ Rn1 ×n2 ×···×nd | rank(X) = r .
Here, rank(X) denotes the multilinear rank [15] of the tensor X, a tuple of d inte-
gers defined via the ranks of the matricizations of X (see Sect. 2.1 for details) and
PΩ : Rn1 ×···×nd → Rn1 ×···×nd is a linear operator. A typical choice for PΩ frequently
encountered in applications is
Xi1 i2 ...,id if (i1 , i2 , . . . , id ) ∈ Ω,
PΩ X :=
0 otherwise,
where Ω ⊂ [1, n1 ] × · · · × [1, nd ] denotes the so-called sampling set. In this case, the
objective function PΩ X − PΩ A2 /2 measures the ability of X to match the entries
of the partially known tensor A in Ω.
The tensor completion problem (1.1) and variants thereof have been discussed a
number of times in the literature. Most of this work builds upon existing work for the
special case d = 2, also known as matrix completion, see [18] for a comprehensive
overview. One of the first approaches to tensor completion has been discussed by Liu
et al. [17]. It is based on extending the notion of nuclear norm to tensors by defining
X∗ as the (weighted) sum of the nuclear norms of the matricizations of X. This
leads to the convex optimization problem
and propose the use of ADMM (alternating direction method of multipliers) and other
splitting methods. This approach has been shown to yield good recovery results when
applied to tensors from various fields such as medical imaging, hyperspectral images
and seismic data. However, nuclear norm minimization approaches are usually quite
costly and involve singular value decompositions of potentially very large matrices.
Liu/Shang [16] recently proposed the use of the economy sized QR decomposition,
reducing the cost per iteration step considerably.
Besides the two approaches described above, a number of variations [20] and al-
ternatives have been discussed in the literature. For example, [17] proposes a block
Low-rank tensor completion by Riemannian optimization 449
coordinate descent method while [23] proposes an iterative hard thresholding method
for fitting the factors of a Tucker decomposition. In [3], gradient optimization tech-
niques have been proposed for fitting the factors of CP decomposition. Closely related
to the approach considered in this paper, Da Silva and Herrmann [8] have recently
proposed to perform tensor completion in the hierarchical Tucker format via Rieman-
nian optimization.
The approach proposed in this paper is based on the observation that the set of
tensors of fixed multilinear rank r, denoted by Mr , forms a smooth manifold [14,
28]. Manifold structure for low-rank tensors has recently been exploited in a num-
ber of works targeting applications in numerical analysis and computational physics,
see [12] for an overview. We will make use of this manifold structure by viewing (1.1)
as an unconstrained optimization problem on Mr . This view allows for the use of Rie-
mannian optimization techniques [2]. A similar approach has been considered in [19,
21, 30] for the matrix case, where it was shown to be competitive to other state-of-
the-art approaches to matrix completion. Note that it is not entirely trivial to extend
such Riemannian optimization techniques from the matrix to the tensor case, due to
the lack of a simple characterization of the metric projection onto Mr [15].
The rest of this paper is organized as follows. In Sect. 2, we recall differential
geometric properties of tensors having fixed multilinear rank and propose a suitable
retraction map for Mr . Section 3 proposes the use of the nonlinear CG algorithm on
Mr for solving (1.1), for which several algorithmic details as well as convergence
properties are discussed. Finally, in Sect. 4, we investigate the effectiveness of our
algorithm for various test cases, including synthetic data, hyperspectral images, and
function-related tensors.
Throughout this paper, we will follow the notation in the survey paper by Kolda and
Bader [15]. In the following, we give a brief summary.
The ith mode matricization
X(i) ∈ Rni × j =i nj
with the core tensor C ∈ Rr1 ×···×rd , and the basis matrices Ui ∈ Rni ×ri . Without loss
of generality, all Ui are orthonormal: UiT Ui = Iri , which will be assumed for the rest
of the paper.
Let us denote the truncation of a tensor X to multilinear rank r using the higher
order singular value decomposition (HOSVD) [9] by PHO r . The HOSVD procedure
can be described by the successive application of best rank-ri approximations Piri in
each mode i = 1, . . . , d:
n1 ×···×nd
r :R
PHO → Mr , X → Pdrd ◦ · · · ◦ P1r1 X.
Proof Let Di denote the open set of tensors whose ith mode matricization has a
nonzero gap between the ri th and the (ri + 1)th singular values. From standard results
in matrix perturbation theory, it then follows [7] that each projector Piri is smooth and
well-defined on Di . Since X ∈ Mr is contained in all Di and is a fixpoint of every
Piri , it is possible to construct an open neighborhood D ∈ Rn1 ×···×nd of X such that
Piri ◦ · · · ◦ P1r1 D ⊆ Di for all i. Hence, the chain rule yields the smoothness of the
operator PHO r on D.
Low-rank tensor completion by Riemannian optimization 451
Observe that dim(Mr ) is much smaller than the dimension of R n1 ×···×nd when
ri ni . The Tucker decomposition (2.2) allows for the efficient representation and
manipulation of tensors in Mr .
According to [14], the tangent space of Mr at X = C ×1 U1 · · · ×d Ud can be
parametrized as
d
d
T
TX M r = G
i=1
×
Ui + C ×i Vi
j =i
×
Uj Vi Ui = 0 , (2.4)
i=1
where G ∈ Rr1 ×···×rd and Vi ∈ Rni ×ri are the free parameters. Furthermore, the or-
thogonal projection of a tensor A ∈ Rn1 ×···×nd onto TX Mr is given by
d d d
A→ A ×
j =1
UjT × Ui +
i=1
C ×i P ⊥
Ui A ×
j =i
UjT †
C(i) ×
k=i
Uk . (2.5)
i=1 (i)
†
Here, C(j ) denotes the pseudo-inverse of C(j ) . Note that C(j ) has full row rank and
† T −1 . We use P⊥ := I − U U T to denote the orthogonal
) = C(j ) (C(j ) C(j ) )
hence C(j T
Ui ri i i
projection onto the orthogonal complement of span(Ui ).
As a metric on Mr , we will use the Euclidean metric from the embedded space in-
duced by the inner product (2.1). Together with this metric, Mr becomes a Rieman-
nian manifold. This in turn allows us to define the Riemannian gradient of an objec-
tive function, which can be obtained from the projection of the Euclidean gradient
into the tangent space.
Proposition 2.2 ([2, Chap. 3.6]) Let f : Rn1 ×···×nd → R be a cost function with Eu-
clidean gradient ∇fX at point X ∈ Mr . Then the Riemannian gradient of f : Mr →
R is given by gradf (X) = PTX Mr (∇fX ).
2.4 Retraction
PM (x + ξ ) = argmin x + ξ − y (2.7)
y∈M
R : U → M, (x, ξ ) → PM (x + ξ ),
R : T Mr → M r , (X, ξ ) → PHO
r (X + ξ ) (2.8)
is a retraction on Mr around X.
R : T M r → Mr , (X, ξ ) → PHO
r ◦ F (X, ξ ),
Fig. 1 Graphical representation of the concept of retraction and vector transport within the framework of
Riemannian optimization techniques
Definition 2.1(b) follows from the fact that the application of the HOSVD to ele-
ments in Mr leaves them unchanged.
It remains to check Definition 2.1(c), the local rigidity condition. Because the
tangent space TX Mr is a first order approximation of Mr around ξ , we have that
(X + tξ ) − PMr (X + tξ ) = O(t 2 ) for t → 0. Thus, using (2.3):
√
(X + tξ ) − R(X, tξ ) ≤ d (X + tξ ) − PMr (X + tξ ) = O t 2 .
TX→Y : TX Mr → TY Mr , ξ → PTY Mr (ξ ).
3 Nonlinear Riemannian CG
With the concepts introduced in Sect. 2, we have all the necessary geometric in-
gredients for performing Riemannian optimization on the manifold Mr of low-rank
tensors. In particular, the nonlinear CG algorithm discussed in [2, Sect. 8.3], yields
Algorithm 1. This can be seen as an extension of the standard nonlinear CG algo-
rithm [22], with the Euclidean gradient replaced by the Riemannian gradient. Ap-
plying retraction after each optimization step ensures that we stay on the manifold.
Finally, the use of vector transport allows us to calculate conjugate directions using
the Polak-Ribière+ (PR+) update rule. If the search directions become insufficiently
gradient-related during the iteration, the algorithm should revert to steepest descent,
454 D. Kressner et al.
see [5]. A standard Armijo backtracking scheme is added to control the step sizes,
using the result of a linearized line search procedure as an initial guess.
The calculation of the Riemannian gradient (2.6) requires the explicit computation of
individual entries of a tensor X from its Tucker decomposition:
r1 r2 rd
Xi1 i2 ...id = ··· Cj1 j2 ...jd (U1 )i1 j1 (U2 )i2 j2 · · · (Ud )id jd .
j1 =1 j2 =1 jd =1
where
d
G := E ×
j =1
UjT , Vi := P⊥
Ui E ×
j =i
UjT †
C(i) ,
(i)
To calculate the new search direction, we use the Polak-Ribière+ update formula
adapted to Riemannian optimization, see [2, 22]:
gradf (Xk ), gradf (Xk ) − TXk−1 →Xk gradf (Xk−1 )
βk = max 0, . (3.2)
gradf (Xk−1 )2
where ξk−1 = gradf (Xk−1 ) is assumed to be in the factorized form (3.1). Moreover,
Xk ∈ Mr is given in terms of a Tucker decomposition Xk = C ×di=1 Ui . As in the
previous section, we obtain
d d
PTX Mr (ξk−1 ) = G
k
×
j =1
Uj + C ×i Vi ×
j =i
Uj , (3.3)
i=1
where
d
G := ξk−1 ×
j =1
UjT , Vi := P⊥
Ui
ξk−1 ×
j =i
UjT †
C(i) .
(i)
To compute and G and Vi , we make use of the linearity in ξk−1 and process each sum-
mand in the representation (3.1) of ξk−1 separately. By exploiting the tensor product
structure of each summand, we then arrive at a total cost of O(nr d ) operations.
Further, the evaluation of (3.2) requires the inner product between the tensor
PTX Mr (ξk−1 ) in (3.3) and ξk = gradf (Xi ) also given in factorized form:
i
d d
ξk = G ×
j =1
Uj + i
C ×i V ×
j =i
Uj .
i=1
i = 0
Utilizing the orthogonality of Ui and the uniqueness condition UiT Vi = UiT V
for the tangent space, see (2.4), we obtain
d
G +
ξk , PTX Mr (ξk−1 ) = G, iT Vi .
C, C ×i V
k
i=1
The evaluation of the (smaller) inner products requires O(nr 2 + r d+1 ) operations.
The norm of ξk−1 appearing in the denominator of (3.2) is computed analogously.
Hence, the total cost for computing βk is given by O(nr d ) operations.
Once βk has been determined, the new conjugate direction is computed by
where ηk−1 ∈ TXk−1 Mr is the previous conjugate direction. The vector transport is
performed exactly in the same way as above. The obtained tensor in TXk Mr is multi-
plied by βk and added to −ξk ∈ TXk Mr . Due to linearity, the addition of two tensors
in the same tangent space is performed by simply adding the corresponding coeffi-
cients G and Vi .
To obtain the next iterate, Algorithm 1 retracts the updated tensor X + αη back to the
manifold by means of the HOSVD. When performing this retraction, we will exploit
the fact that X ∈ Mr is in Tucker decomposition and η ∈ TX Mr is represented in the
factorized form (3.1):
d
d d
X + αη = C ×
i=1
Ui + α G × Ui +
i=1
C ×i Vi ×
j =i
Uj
i=1
d d
= (C + αG) ×
i=1
Ui + α C ×i Vi ×
j =i
Uj
i=1
d
=S ×
i=1
[Ui , Vi ]
where S ∈ R2r1 ×···×2rd has the special structure depicted in Fig. 2. After orthogonal-
izing the combined basis matrices [Ui , Vi ] and a corresponding update of S, we can
then restrict the application of the HOSVD to the smaller tensor S, which requires
only O(r d+1 ) operations. The retraction is completed by multiplying the basis ma-
trices obtained from the HOSVD of S to the combined basis factors. In total, the
retraction requires O(nr 2 + r d+1 ) operations.
Following [30], we obtain an initial guess for the step size α in Algorithm 1 by per-
forming exact line search along the tangent space. This leads to the optimization
problem
α ∗ = argmin PΩ (X + αξ ) − PΩ A ,
2
α
Low-rank tensor completion by Riemannian optimization 457
PΩ ξ, PΩ (A − X)
α∗ = . (3.4)
PΩ ξ, PΩ ξ
lim gradg(Xk ) = 0.
k→∞
458 D. Kressner et al.
Proof By construction of the line search, all iterates Xk fulfill g(Xk ) ≤ g(X0 ) and
therefore
d
1 † 2
PΩ Xk − PΩ A2 + μ2 Xk,(i) 2 + Xk,(i) ≤ g(X0 ) =: C02 ,
2
i=1
yielding upper and lower bounds for the largest and smallest singular values, respec-
tively:
−1 †
σmax (Xk,(i) ) ≤ Xk,(i) ≤ C0 /μ, σmin (Xk,(i) ) ≤ Xk,(i) ≤ C0 /μ.
Now suppose, conversely to the statement of the proposition, that gradg(Xk ) does
not converge to zero. Then there is δ > 0 and a subsequence of {Xk } such that
gradg(Xk ) > δ for all elements of the subsequence. Since Xk ∈ B, it follows that
this subsequence has an accumulation point X∗ for which also gradg(X∗ ) > δ.
However, this contradicts [2, Theorem 4.3.1], which states that every accumulation
point is a critical point of g.
† † † † T −1
∂X(i) = −X(i) (∂X(i) )X(i) + I − X(i) X(i) (∂X(i) )T X(i) X(i) .
in terms of the singular value decomposition X(i) = Ui Σi ViT . The operation [ · ](i)
reverses matricization, that is, [X(i) ](i) = X.
The statement of Proposition 3.2 holds for arbitrarily small μ. If the smallest sin-
gular values of the matricizations stay bounded from below as μ → 0, that is, the
accumulation points X∗ of {Xk } do not approach the boundary of Mr as μ → 0,
then (3.7) shows that gradf (X∗ ) → 0 as μ → 0. Thus, the regularization term be-
comes negligible in such a situation. For more details, we refer to the discussion
in [30, Sect. 4.1].
Low-rank tensor completion by Riemannian optimization 459
4 Numerical experiments
Algorithm 1 (geomCG) was implemented in M ATLAB version 2012a, using the Ten-
sor Toolbox version 2.5 [4] for handling some of the tensor operations. However, to
attain reasonable performance, it was important to implement operations with sparse
tensors in C and call them via mex interfaces. In particular, this was done for the
evaluation of the objective function (1.1), the computation of the Euclidean gradi-
ent and its projection onto the tangent space (3.1), as well as for the linearized line
search (3.4). For simplicity, we restricted the implementation to the case d = 3. The
source code is freely available under a BSD license and can be downloaded from
http://anchp.epfl.ch.
To measure the convergence during the iteration, Algorithm 1 computes the rela-
tive residual
PΩ X − PΩ A
.
PΩ A
However, to investigate the reconstruction quality of the algorithm, measuring the
relative residual on the sampling set Ω is not sufficient. For this purpose, we also
measure the relative error PΓ X − PΓ A/PΓ A on a random test set Γ of the same
cardinality as Ω.
Unless stated otherwise, we assume that the tensor has equal size in all modes,
n := n1 = n2 = n3 and similarly for the ranks, r := r1 = r2 = r3 . All tests were
performed on a quad-core Intel Xeon E31225, 3.10 GHz, with 8 GB of RAM running
64-Bit Debian 7.0 Linux. Stated calculation times are wall-clock times, excluding the
set-up time of the problem.
A synthetic data tensor A of exact multilinear rank r is created by choosing the entries
of the core tensor C and the basis matrices U1 , U2 , U3 as pseudo-random numbers
from a uniform distribution on [0, 1].
As a first test, we check that the implementation of Algorithm 1 exhibits the same
scaling behaviour per iteration as predicted by the theoretical discussion in Sect. 3.5.
To measure the scaling with regard to the tensor size n, we fix the multilinear rank
to r = (10, 10, 10) and scale the size of the sampling set linearly with the tensor size,
|Ω| = 10n. We perform 10 iterations of our algorithm and repeat the process 10 times
for different randomly chosen datasets. Analogously, we measure the dependence on
the tensor rank by setting the tensor size to n = 300 and fixing the sampling set to
0.1 % of the full tensor.
The results are shown in Fig. 3. We observe that our algorithm scales indeed lin-
early in the tensor size over a large interval n ∈ [100, 3000]. Even for such large
tensors, the time per iteration step is very low. Plotting the results for the scaling with
regard to the tensor rank, we observe an O(r 3 )-dependence, in agreement with (3.5).
We compare the reconstruction performance of our algorithm with the hard com-
pletion algorithm by Signoretto et al. [26, Alg. 3], based on the so called inexact
460 D. Kressner et al.
Fig. 3 Time needed per iteration step for various problem sizes. Left: Runtime with fixed rank
r = (10, 10, 10) and varying tensor size n = n1 = n2 = n3 ∈ {100, 150, . . . , 3000}. The size of the sam-
pling set scales linearly with n, |Ω| = 10n. Right: Runtime with fixed tensor size n = (300, 300, 300) and
varying tensor rank r = r1 = r2 = r3 ∈ {20, 25, . . . , 100}. Size of sampling set: 0.1 % of the full tensor.
The dashed line shows a scaling behaviour O(r 3 )
PΩ X∗ − PΩ (A + E) /PΩ A ≈ PΩ A − PΩ (A + E) /PΩ A = ε0 .
To test that the noise does not lead to a misidentification of the rank of the underlying
problem, we compare the case where we take the initial guess on the correct manifold
to an uninformed rank-(1, 1, 1) guess. There, we employ a heuristic rank adaptation
strategy discussed in Sect. 4.3. We show in Fig. 5 that in both cases we can indeed
recover the original data up to the given noise level.
Low-rank tensor completion by Riemannian optimization 461
Fig. 4 Convergence curves for different sampling set sizes as functions of iterations and time for our
proposed algorithm (geomCG) and the hard completion algorithm by Signoretto et al. [26]. Tensor size
and multilinear rank fixed to n = 100 and r = (5, 5, 5), respectively (Color figure online)
Fig. 5 Tensor completion from noisy measurements with n = 100, r = (6, 6, 6). The relative size of the
sampling set was fixed to 10 %. The black line corresponds to the noise-free case. The different colors
correspond to the noise levels ε0 ∈ {10−4 , 10−6 , . . . , 10−12 }. Left: Results when the underlying rank r is
known. Right: Results for the case of unknown rank of the underlying problem. Due to the rank adaptation
procedure, more iterations are necessary
It is well known that in the matrix case, the number of random samples needed to
exactly recover the original matrix is at least O(nr log n) under standard assump-
tions; see e.g. [6, 13]. In the left plot of Fig. 6, we present numerical experiments
suggesting that a similar statement may hold for the three-dimensional tensor case.
The algorithm is declared converged (and hence yields perfect reconstruction) if the
relative residual drops below 10−6 within 100 iterations.
462 D. Kressner et al.
Fig. 6 Scaling of the sampling set size needed to reconstruct the original tensor of fixed multilin-
ear rank (10, 10, 10). Left: Minimum size of sampling set needed to attain convergence vs. tensor size
n = n1 = n2 = n3 . Right: Phase transition of the convergence speed (4.1). White means fast convergence,
black means no convergence. The line corresponds to O(n log(n))
The right plot of Fig. 6 displays a phase transition of the measured convergence
speed of Algorithm 1, computed from
1
PΓ Xkend − PΓ A 10
ρ= ∈ [0, 1], (4.1)
PΓ Xkend −10 − PΓ A
4.3 Applications
In the following, we assess the performance of our algorithm on tensors derived from
applications. In contrast to synthetic data sets, tensors from applications usually do
not posses a clear, well-defined multilinear rank. Often, they exhibit a rather smooth
decay of the singular values in each matricization. In such a setting, Algorithm 1 re-
quires a good initial guess, as directly applying it with a (large) fixed rank r usually
results in severe overfitting. We propose the following heuristic to address this prob-
lem: Starting from a multilinear rank-(1, 1, 1)-approximation, we iteratively increase
the multilinear rank in each mode and rerun our algorithm with the previous result as
initial guess. This procedure is repeated until the prescribed final multilinear rank r
is reached. We increase the multilinear rank every time the current relative change in
the square root of the cost function is smaller than a tolerance δ:
f (Xi−1 ) − f (Xi ) < δ f (Xi ). (4.2)
Table 1 Reconstruction results for “Ribeira” hyperspectral image. The results for frame, mode-3 and
tensor are taken from [27]. geomCG(r1 , r2 , r3 ) denotes the result of Algorithm 1 using a prescribed final
multilinear rank (r1 , r2 , r3 )
for δ should be chosen smaller, at the cost of additional iterations. In the following
numerical experiments, we always include this initialization procedure in the reported
computation times.
Fig. 7 Full hyperspectral image data set “Ribeira” scaled to size (203, 268, 33). Top left: Singular value
decay of each matricization. Top right: The sampled tensor PΩ A with 10 % known entries. Unknown
entries are marked in black. Bottom left: Result of our algorithm with iterative increase of ranks up to a
final rank of r = (15, 15, 6), corresponding to entry geomCG(15, 15, 6) in Table 1. Bottom right: Result of
our algorithm with iterative increase of ranks up to a final rank of r = (65, 65, 7), corresponding to entry
geomCG(65, 65, 7) in Table 1
choosing the final mode-1 and mode-2 ranks of the approximation significantly larger
than the mode-3 rank. It can be observed that our algorithm (geomCG) yields very
competitive results, especially in the case where the sampling set is small. There is
one case of overfitting for geomCG(55, 55, 5), marked by a star.
discretized on a uniform tensor grid with mesh width h = 1/100. The function values
are collected in a tensor A ∈ R201×201×201 . In this setting, we assume that the location
of the singularity is known a priori. As f has a cusp at the origin, the information
in A is strongly localized at this point and tensor completion applied naively to A
Low-rank tensor completion by Riemannian optimization 465
would not lead to reasonable compression. To avoid this effect, we therefore cut out
a small hypercube [−0.1, 0.1]3 , corresponding to the 21 × 21 × 21 central part of the
discretized tensor. The idea is to not include this region in the sampling set Ω. The
entries corresponding to this region are stored separately and reconstructed exactly
after performing low-rank tensor completion on the remaining region. We therefore
do also not include the central part in the test set Γ when verifying the accuracy of the
completed tensor. The obtained results are shown in Fig. 8. Already sampling 5 % of
the entries gives an accuracy of 10−5 . This would yield a compression ratio of 5.1 %
if we stored the involved entries. However, storing the rank-(5, 5, 5) approximation
along with the central part yields the significantly lower compression ratio of 0.15 %.
(A0 + α1 A1 + α2 A2 + α3 A3 )x = f, (4.4)
The parameters α are then sampled uniformly on a tensor grid on [−1, 1] × [−1, 1] ×
[−1, 1]. Assuming that we are only interested in the mean of the solution for a specific
set of parameters, this results in the solution tensor X ∈ Rn×n×n , where each entry
of this tensor requires the solution of a discretized PDE (4.4) for one combination
of the discretized (α1 , α2 , α3 ). Hence, evaluating the full tensor is fairly expensive.
Using tensor completion, we sample X at (few) randomly chosen points and try to
approximate the missing entries.
In Fig. 9 we show the results of this approach for m = 50, n = 100, and two differ-
ent choices of αμ . We used the Karhunen-Loève eigenvalues λμ = 5 exp(−2μ) and
λμ = (1 + μ)−2 , respectively. As the second choice results in slower singular value
decays, our algorithm requires more iterations to attain the same accuracy. Using 5 %
of the tensor as a sampling set is in both cases sufficient to recover the original tensor
to good precision. As the sampling set gets smaller, overfitting of the sampling data
is more likely to occur, especially for the second choice of λμ .
Low-rank tensor completion by Riemannian optimization 467
5 Conclusions
We have shown that the framework of Riemannian optimization yields a very effec-
tive nonlinear CG method for performing tensor completion. Such a method has also
been suggested in [8]. One of the main contributions in this paper consists of a care-
ful discussion of the algorithmic and implementation details, showing that the method
scales well for large data sets and is competitive to existing methods for tensor com-
pletion. On the theoretical side, we have proven that HOSVD satisfies the properties
of a retraction and discussed the convergence properties of the nonlinear CG method.
The numerical experiments indicate the usefulness of tensor completion not only
for data-related but also for function-related tensors. We feel that this aspect merits
further exploration. To handle high-dimensional applications, the approach consid-
ered in this paper needs to be extended to other SVD-based low-rank tensor formats,
such as the tensor train and the hierarchical Tucker formats, see also [8].
References
1. Absil, P.A., Malick, J.: Projection-like retractions on matrix manifolds. SIAM J. Control Optim. 22(1),
135–158 (2012)
2. Absil, P.A., Mahony, R., Sepulchre, R.: Optimization Algorithms on Matrix Manifolds. Princeton
University Press, Princeton (2008)
3. Acar, E., Dunlavy, D.M., Kolda, T.G., Mørup, M.: Scalable tensor factorizations for incomplete data.
Chemom. Intell. Lab. Syst. 106, 41–56 (2011)
4. Bader, B.W., Kolda, T.G., et al.: Matlab tensor toolbox version 2.5 (2012). Available from
http://www.sandia.gov/~tgkolda/TensorToolbox/
5. Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientific, Belmont (1999)
6. Candès, E.J., Tao, T.: The power of convex relaxation: near-optimal matrix completion. IEEE Trans.
Inf. Theory 56(5), 2053–2080 (2009)
7. Chern, J.L., Dieci, L.: Smoothness and periodicity of some matrix decompositions. SIAM J. Matrix
Anal. Appl. 22(3), 772–792 (2000)
8. Da Silva, C., Herrmann, F.J.: Hierarchical Tucker tensor optimization—applications to tensor com-
pletion. In: Proc. 10th International Conference on Sampling Theory and Applications (2013)
9. De Lathauwer, L., De Moor, B., Vandewalle, J.: A multilinear singular value decomposition. SIAM
J. Matrix Anal. Appl. 21(4), 1253–1278 (2000)
10. Foster, D.H., Nascimento, S.M.C., Amano, K.: Information limits on neural identification of colored
surfaces in natural scenes. Vis. Neurosci. 21, 331–336 (2004)
11. Gandy, S., Recht, B., Yamada, I.: Tensor completion and low-n-rank tensor recovery via convex opti-
mization. Inverse Probl. 27(2), 025010 (2011)
12. Grasedyck, L., Kressner, D., Tobler, C.: A literature survey of low-rank tensor approximation tech-
niques. GAMM-Mitt. 36(1), 53–78 (2013)
13. Keshavan, R.H., Montanari, A., Oh, S.: Matrix completion from noisy entries. J. Mach. Learn. Res.
11, 2057–2078 (2010)
14. Koch, O., Lubich, C.: Dynamical tensor approximation. SIAM J. Matrix Anal. Appl. 31(5), 2360–
2375 (2010)
15. Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Rev. 51(3), 455–500
(2009)
16. Liu, Y., Shang, F.: An efficient matrix factorization method for tensor completion. IEEE Signal Pro-
cess. Lett. 20(4), 307–310 (2013)
17. Liu, J., Musialski, P., Wonka, P., Ye, J.: Tensor completion for estimating missing values in visual
data. In: Proc. IEEE 12th International Conference on Computer Vision, pp. 2114–2121 (2009)
18. Ma, Y., Wright, J., Ganesh, A., Zhou, Z., Min, K., Rao, S., Lin, Z., Peng, Y., Chen, M., Wu, L., Can-
dès, E., Li, X.: Low-rank matrix recovery and completion via convex optimization. Survey website.
http://perception.csl.illinois.edu/matrix-rank/. Accessed: 22 April 2013
468 D. Kressner et al.
19. Mishra, B., Meyer, G., Bonnabel, S., Sepulchre, R.: Fixed-rank matrix factorizations and Riemannian
low-rank optimization (2012). arXiv:1209.0430
20. Mu, C., Huang, B., Wright, J., Goldfarb, D.: Square deal: lower bounds and improved relaxations for
tensor recovery (2013). arXiv:1307.5870
21. Ngo, T., Saad, Y.: Scaled gradients on Grassmann manifolds for matrix completion. In: Bartlett, P.,
Pereira, F., Burges, C., Bottou, L., Weinberger, K. (eds.) Advances in Neural Information Processing
Systems, vol. 25, pp. 1421–1429 (2012)
22. Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn. Springer Series in Operations Research.
Springer, Berlin (2006)
23. Rauhut, H., Schneider, R., Stojanac, Z.: Low rank tensor recovery via iterative hard thresholding. In:
Proc. 10th International Conference on Sampling Theory and Applications (2013)
24. Schwab, C., Gittelson, C.J.: Sparse tensor discretizations of high-dimensional parametric and stochas-
tic PDEs. Acta Numer. 20, 291–467 (2011)
25. Signoretto, M., De Lathauwer, L., Suykens, J.A.K.: Nuclear norms for tensors and their use for convex
multilinear estimation. Tech. Rep. 10-186, K. U. Leuven (2010)
26. Signoretto, M., Tran Dinh, Q., De Lathauwer, L., Suykens, J.A.K.: Learning with tensors: a frame-
work based on convex optimization and spectral regularization. Tech. Rep. 11-129, K. U. Leuven
(2011)
27. Signoretto, M., Van de Plas, R., De Moor, B., Suykens, J.A.K.: Tensor versus matrix completion: a
comparison with application to spectral data. IEEE Signal Process. Lett. 18(7), 403–406 (2011)
28. Uschmajew, A.: Zur Theorie der Niedrigrangapproximation in Tensorprodukten von Hilberträumen.
Ph.D. thesis, Technische Universität, Berlin (2013)
29. Uschmajew, A., Vandereycken, B.: The geometry of algorithms using hierarchical tensors. Linear
Algebra Appl. 439(1), 133–166 (2013)
30. Vandereycken, B.: Low-rank matrix completion by Riemannian optimization. SIAM J. Optim. 23(2),
1214–1236 (2013)