Steepest Descent Algorithms For Optimization Under Unitary Matrix Constraint PDF
Steepest Descent Algorithms For Optimization Under Unitary Matrix Constraint PDF
3, MARCH 2008
Abstract—In many engineering applications we deal with con- the space of matrices using a classical steepest descent
strained optimization problems with respect to complex-valued (SD) algorithm. Separate orthogonalization procedure must
matrices. This paper proposes a Riemannian geometry ap- be applied after each iteration [12], [20]–[22]. Approaches
proach for optimization of a real-valued cost function of
complex-valued matrix argument W, under the constraint that stemming from the Lagrange multipliers method have also
W is an unitary matrix. We derive steepest descent been used to solve such problems [16]. In such approaches,
(SD) algorithms on the Lie group of unitary matrices ( ). the error criterion contains an extra term that penalizes for
The proposed algorithms move towards the optimum along the the deviations from orthogonality property. Self-stabilized
geodesics, but other alternatives are also considered. We also
address the computational complexity and the numerical stability
algorithms have been developed to provide more accurate, but
issues considering both the geodesic and the nongeodesic SD still approximate solutions [17]. Major improvements over the
algorithms. Armijo step size [1] adaptation rule is used similarly classical methods are obtained by taking into account the geo-
to [2], but with reduced complexity. The theoretical results are metrical aspects of the optimization problem. Pioneering work
validated by computer simulations. The proposed algorithms are by Luenberger [28] and Gabay [29] convert the constrained
applied to blind source separation in MIMO systems by using the
joint diagonalization approach [3]. We show that the proposed optimization problem into an unconstrained problem, on an
algorithms outperform other widely used algorithms. appropriate differentiable manifold. An extensive treatment of
optimization algorithms with orthogonality constraints is given
Index Terms—Array processing, optimization, source separa-
tion, subspace estimation, unitary matrix constraint. later by Edelman et al. [24] in a Riemannian geometry context.
A non-Riemannian approach has been proposed in [2], which
is a general framework for optimization under unitary matrix
I. INTRODUCTION constraint. A more-detailed literature review is presented in
Section II.
ONSTRAINED optimization problems arise in many
C signal processing applications. One common task is to
minimize a cost function with respect to a matrix, under the con-
In this paper we derive two generic algorithms stemming
from differential geometry. They optimize a real-valued cost
function with respect to complex-valued matrix sat-
straint that the matrix has orthonormal columns. Some typical isfying , i.e., perform optimization sub-
applications in communications and array signal processing are ject to the unitary matrix constraint. SD algorithms operating
subspace tracking [4]–[6], blind and constrained beamforming on the Lie group of unitary matrices are proposed. They
[7]–[9], high-resolution direction finding (e.g., MUSIC and move towards the optimum along the locally shortest paths, i.e.,
ESPRIT), and generally all subspace-based methods. Another geodesics. Geodesics on Riemannian manifolds correspond to
straightforward application is the independent component anal- the straight lines in Euclidean space. Our motivation to opt for
ysis (ICA), [3], [10]–[19]. This type of optimization problem the geodesic algorithms is that on the Lie group of unitary ma-
has also been considered in the context of multiple-input mul- trices , the geodesics have simple expressions described by
tiple-output (MIMO) communication systems [6], [20]–[23]. the exponential map. We can fully exploit recent developments
Most of the existing optimization algorithms are derived for the in computing the matrix exponential needed in the multiplica-
real-valued case and orthogonal matrices [10], [11], [13]–[15], tive update on . The generalized polar decomposition [30]
[17], [24]–[27]. Very often in communications and signal pro- proves to be one of the most computationally efficient method if
cessing applications we are dealing with complex matrices and implemented in a parallel architecture, or the Cayley transform
signals. Consequently, the optimization need to be performed (CT) [31] otherwise. We also consider other parametrizations
under unitary matrix constraint. proposed in the literature and show that all these parametriza-
Commonly optimization algorithms employing orthog- tions are numerically equivalent up to a certain approximation
onal/unitary matrix constraint minimize a cost function on order. However, the algorithms differ in terms of computational
complexity, which is also addressed in this paper. The proposed
Manuscript received January 30, 2007; revised August 13, 2007. The as- generic geodesic algorithms, unlike other parametrizations, can
sociate editor coordinating the review of this manuscript and approving it for be relatively easily adapted to different problems with varying
publication was Dr. Sergiy Vorobyov. This work was supported in part by the
Academy of Finland and by the GETA Graduate School. complexity and strictness of the unitarity property requirements.
The authors are with the SMARAD CoE, Signal Processing Laboratory, De- This is due the fact that the computation of the matrix exponen-
partment of Electrical Engineering, Helsinki University of Technology, FIN- tial function employed in the proposed algorithms is a well-re-
02015 HUT, Finland (e-mail: [email protected]; [email protected];
[email protected]). searched problem [32] with some recent progress relevant to
Digital Object Identifier 10.1109/TSP.2007.908999 the unitary optimization [30]. Moreover, we show that the expo-
1053-587X/$25.00 © 2008 IEEE
Authorized licensed use limited to: Indian Institute of Technology Indore. Downloaded on May 05,2023 at 14:46:32 UTC from IEEE Xplore. Restrictions apply.
ABRUDAN et al.: SD ALGORITHMS FOR OPTIMIZATION 1135
nential map is well suited for adapting the step size for the SD we depart from the constrained surface in each step. This hap-
method on the unitary group. pens because a Riemannian manifold is a “curved space.” The
This paper is organized as follows. In Section II, an overview locally length-minimizing curve between two points on the Rie-
of the problem of optimization under unitary matrix constraint is mannian manifold is called a geodesic and it is not a straight
provided. A brief review of different approaches presented in the line like on the Euclidean space. Several authors [10], [11], [13],
literature is given as well. A simple geometric example is used [14], [24], [25], [28], [29], [34], [37], [38] have proposed that the
to illustrate the differences among various approaches. In Sec- search for the optimum should proceed along the geodesics of
tion III, we derive the Riemannian gradient on the Lie group of the constrained surface. Relevant work in Riemannian optimiza-
unitary matrices and the corresponding SD algorithms. Equiva- tion algorithms may be found in [24], [29], [34], and [38]–[41].
lence relationships between the proposed algorithms and other Algorithms considering the real-valued Stiefel and/or Grass-
algorithms are established in Section IV. The computational mann manifolds have been proposed in [10], [11], [15], [17],
complexity and the numerical stability issues are studied in Sec- [24]–[26], and [42]. Edelman et al. [24] consider the problem of
tions V and VI, respectively. Simulation results are presented in optimization under orthonormal constraints. They propose SD,
Section VII. The proposed algorithms are used to solve the uni- conjugate gradient, and Newton algorithms along geodesics on
tary matrix optimization problem encountered in the joint ap- Stiefel and Grassman manifolds.
proximate diagonalization of eigenmatrices (JADE) algorithm A general framework for optimization under unitary matrix
[3] which is applied for blind source separation in a MIMO constraints is presented in [2]. It is not following the traditional
system. Finally, Section VIII concludes the paper. Riemannian optimization approach. A modified SD algorithm,
coupled with Armijo’s step size adaptation rule [1] and a
II. OPTIMIZATION UNDER UNITARY MATRIX CONSTRAINT modified Newton algorithm are proposed for optimization on
both the complex Stiefel and the complex Grassmann mani-
In this section, a brief overview of optimization methods fold. These algorithms do not employ a geodesic motion, but
under orthonormal or unitary matrix constraint is provided. Dif- geodesic motion could be used in the general framework. A
ferent approaches are reviewed and the key properties of each local parametrization based on an Euclidean projection of the
approach are briefly studied. A simple example is presented to tangent space onto the manifold is used in [2]. Hence, the
illustrate how each algorithm searches for the optimum. computational cost may be reduced. Moreover, it is suggested
that the geodesic motion is not the only solution, since there
A. Overview is no direct connection between the Riemannian geometry of
Most of classical optimization methods with unitary matrix the Stiefel (or Grassmann) manifold (i.e., the “constrained
constraint operate on the Euclidean space by using a SD algo- surface”) and an arbitrary cost function.
rithm. The unitary property of the matrix is lost in every itera- The SD algorithms proposed in this paper operate on the
tion, and it needs to be restored in each step. Moreover, the con- Lie group of unitary matrices . We have derived the
vergence speed is reduced. Other algorithms use a Lagrangian Riemannian gradient needed in the optimization on . We
type of optimization, by adding an extra-penalty function which choose to follow a geodesic motion. This is justified by the
penalizes for the deviation from unitarity [16]. These methods desirable property of that the right multiplication is an
suffer from slow convergence and find only an approximate isometry with respect to the canonical bi-invariant metric [35].
solution in terms of orthonormality. Self-stabilized algorithms This allows us to translate the descent direction at any point
provide more accurate solutions [17], [33]. in the group to the identity element and exploit the fact that
A major drawback of the classical Euclidean SD and La- the tangent space at identity is the Lie algebra of skew-Hermi-
grange type of algorithms [12], [16], [20]–[22] is that they do tian matrices. This leads to lower computational complexity
not take into account the special structure of the parameter space because the argument of the matrix exponential operation is
where the cost function needs to be optimized. The constrained skew-Hermitian. Novel methods for computing the matrix
optimization problem may be formulated as an unconstrained exponential operation for skew-symmetric matrices recently
one in a different parameter space called manifold. Therefore, proposed in [30] and [32] may be exploited. Moreover, we
the space of unitary matrices is considered to be a “constrained show that using an adaptive step size according to Armijo’s
surface.” Optimizing a cost function on a manifold is often con- rule [1] fits very well to the proposed algorithms.
sidered [10], [14], [24], [25], [29], [34] as a problem of Rie-
mannian geometry [35]. Algorithms more general than the tra- B. Illustrative Example
ditional Riemannian approach are considered in [2]. We present a rather simple simulation example, in order to il-
The second important aspect neglected in classical algorithms lustrate how different algorithms operate under the unitary con-
is that the unitary matrices are algebraically closed under straint. We consider the Lie group of unit-norm complex num-
the multiplication operation, not under addition. Therefore, they bers , which are the 1 1 unitary matrices. The unitary
form a group under the multiplication operation, which is the constraint is in this case the unit circle. We minimize the cost
Lie group of unitary matrices, [36]. Consequently, function , subject to . Five different
by using an iterative algorithm based on an additive update the algorithms are considered. The first one is the unconstrained SD
unitarity property is lost after each iteration. Even though we algorithm on the Euclidean space, with the corresponding up-
are moving along a straight line pointing in the right direction, date , where is the step size. The
Authorized licensed use limited to: Indian Institute of Technology Indore. Downloaded on May 05,2023 at 14:46:32 UTC from IEEE Xplore. Restrictions apply.
1136 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO. 3, MARCH 2008
Authorized licensed use limited to: Indian Institute of Technology Indore. Downloaded on May 05,2023 at 14:46:32 UTC from IEEE Xplore. Restrictions apply.
ABRUDAN et al.: SD ALGORITHMS FOR OPTIMIZATION 1137
B. Riemannian Structure on
A differentiable function represents a
curve on the smooth manifold (see Fig. 2). Let
and let be the set of functions on that are differentiable
at . The tangent vector to the curve at
given by
is a function W , and a tangent vector X
Fig. 2. Illustrative example representing the tangent space
2 T U (n ) .
T U (n) at point
Authorized licensed use limited to: Indian Institute of Technology Indore. Downloaded on May 05,2023 at 14:46:32 UTC from IEEE Xplore. Restrictions apply.
1138 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO. 3, MARCH 2008
Fig. 3. The geodesic emanating from identity in the direction of 0G ending at P = exp( 0G ).
Authorized licensed use limited to: Indian Institute of Technology Indore. Downloaded on May 05,2023 at 14:46:32 UTC from IEEE Xplore. Restrictions apply.
ABRUDAN et al.: SD ALGORITHMS FOR OPTIMIZATION 1139
Authorized licensed use limited to: Indian Institute of Technology Indore. Downloaded on May 05,2023 at 14:46:32 UTC from IEEE Xplore. Restrictions apply.
1140 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO. 3, MARCH 2008
diagonal Padé approximation of (i.e., , The equivalence up to the first order between the update (20) and
see [32]). the other update expressions (15), (16), and (19) is obtained by
expanding the columns of the matrix from the Gram-
C. The Euclidean Projection Map Schmidt process separately in a Taylor series (proof available
Another possibility is to use an Euclidean projection map as on request)
a local parametrization as in [2]. This map projects an arbi-
(21)
trary matrix onto at a point which is the
closest point to in terms of Euclidean norm, i.e.,
The equivalence may extend to higher orders, but this remains
. The unitary matrix
to be studied.
minimizing the above norm can be obtained from the polar de-
composition of as in [47]
V. COMPUTATIONAL COMPLEXITY
(17) In this section, we evaluate the computational complexity of
the SD algorithms on by considering separately the the
where and are the left and the right singular vectors of
geodesic and the nongeodesic SD algorithms. The proposed
, respectively. Equivalently
geodesic SD algorithms use the rotational update of form
, and the rotation matrix is computed
(18)
via matrix exponential. We review the variety of algorithms
Equation (18) is also known as “the symmetric orthogonaliza- available in the literature for calculating the matrix exponential
tion” procedure [12]. In [2], the local parametrizations are more in the proposed algorithms. Details are given for the matrix
general in the sense that they are chosen for the Stiefel and exponential algorithms with the most appealing properties. The
Grassmann manifolds. The projection operation is computed via nongeodesic SD algorithms are based on an update expression
SVD as in (17). For the SD algorithm on the Stiefel manifold [2], of form , and the computational
the update is of form , where is the complexity of different cases is described. The cost of adapting
SD direction on the manifold and is the step size. According to to the step size using the Armijo rule is also evaluated. The SD
(18) the update is equivalent to method on involves overall complexity of flops1
. By expanding the above expression per iteration. Algorithms like conjugate gradient or Newton
in a Taylor series we get algorithm are expected to provide a faster convergence, but also
their complexity is expected to be higher. Moreover, a Newton
algorithm is more likely to converge to stationary points other
than local minima.
(19)
Considering that and , for A. Geodesic SD Algorithms on
the three update expressions (15), (16), and (19) become equiv-
alent up to the second order. In general, the geodesic motion on manifolds is computation-
ally expensive. In the case of , the complexity is reduced
D. The Projection Based on the QR-Decomposition even though it requires the computation of the matrix exponen-
tial. We are interested in the special case of matrix exponential
A computationally inexpensive way to approximate the op- of form , where and is a skew-Hermi-
timal projection of an arbitrary matrix onto (18) is the tian matrix. Obviously, finding the exponential of a skew-Her-
QR-decomposition. We show that this is not the optimal pro- mitian matrix has lower complexity than of general matrix. The
jection in terms of minimum Euclidean distance, but is accu- matrix exponential operation maps the skew-Hermitian matrices
rate enough to be used in practical applications. We establish from the Lie algebra into unitary matrices which reside on
a connection between the QR-based projection and the optimal the Lie group . Several alternatives for approximating the
projection. Let us consider the QR-decomposition of the arbi- matrix exponential have been proposed in [30], [32], [48], and
trary nonsingular matrix given by [49]. In general the term “approximation” may refer to two dif-
where is an upper triangular matrix and is a unitary ma- ferent things. The first kind of approximation maps the elements
trix. The unitary matrix is an approximation of the optimal of the Lie algebra exactly into the Lie group, and the approxi-
projection of onto , i.e., . The connec- mation takes place only in terms of deviation “within the con-
tion between this projection and the optimal projection can be strained surface.” Among the most efficient methods from this
established by using the polar decomposition of the upper-trian- category are: the diagonal Padé approximation [32], General-
gular matrix . We obtain , ized Polar Decomposition [30], [48], technique of coordinates
where and . There- of the second kind [49]. The second category includes methods
fore, the matrix is an approximation of the optimal pro- for which the resulting elements do not reside on the group any-
jection and it includes an additional rotation from more, i.e., . The most popular methods belonging to
. The update of the SD algorithm is equal to the unitary
1One “flop” is defined as a complex addition or a complex multiplication. An
factor from the QR-decomposition. In other words, if we have
+
operation of form ab c; a; b; c 2 is equivalent to two flops. A simple mul-
, then 2 2
tiplication of two n n matrices requires n flops. This is a quick evaluation
of the computational complexity, not necessarily proportional to the computa-
(20) tional speed.
Authorized licensed use limited to: Indian Institute of Technology Indore. Downloaded on May 05,2023 at 14:46:32 UTC from IEEE Xplore. Restrictions apply.
ABRUDAN et al.: SD ALGORITHMS FOR OPTIMIZATION 1141
this category is the truncated Taylor series and the nondiagonal TABLE III
Padé approximation. They do not preserve the algebraic proper- THE COMPLEXITY (IN FLOPS) OF COMPUTING THE LOCAL PARAMETRIZATION
ties, but they still provide reasonable performance in some ap- IN U (n)
plications [50]. Their accuracy may be improved by using them
together with the scaling and squaring procedure.
1) Padé Approximation of the Matrix Exponential: This is
[32, Method 2], and together with scaling and squaring [32,
Method 3] is considered to be one of the most efficient methods
for approximating a matrix exponential. For normal matrices basis in the range space of . This can be done by using House-
(i.e., matrices which satisfy ), the Padé holder reflections, Givens rotations or the Gram-Schmidt proce-
approximation prevents the round-off error accumulation. The dure [32]. The most computationally efficient and numerically
skew-Hermitian matrices are normal matrices, therefore, they stable approach is the modified Gram-Schmidt procedure which
enjoy this benefit. Because we deal with a SD algorithm on requires only flops.
we are also concerned about preserving the Lie algebraic In Table III, we summarize the complexity2 of computing
properties. The diagonal Padé approximation preserves the the local parametrizations for the geodesic and the nongeodesic
unitarity property accurately. The Padé approximation together methods, respectively. The geodesic methods include: the diag-
with scaling and squaring supposes the choice of the Padé onal Padé approximation with scaling and squaring
approximation order and the scaling and squaring exponent of type (1, 0) [32] (CT), the Generalized Polar Decom-
to get the best approximant given the approximation accuracy. position with reduction to tridiagonal form (GPD-IZ) [30] and
See [32] for information of choosing the pair optimally. without reduction to the tridiagonal form (GPD-ZMK) [48]. All
The complexity of this approximation is methods have an approximation order of two. The nongeodesic
flops. The drawback of Padé method when used together with methods include the optimal projection (OP) and its approxima-
the scaling and squaring procedure is that if the norm of the tion (AOP).
argument is large the computational efficiency decreases due to
the repeated squaring. C. The Cost of Using an Adaptive Step Size
2) Approximation of the Matrix Exponential via Generalized In this subsection, we analyze the computational complexity
Polar Decomposition (GPD): The GPD method, recently pro- of adapting the step size with Armijo rule. The total computa-
posed in [30] is consistent with the Lie group structure as it maps tional cost is given by the complexity of computing the local
the elements of the Lie algebra exactly into the corresponding parametrization and the additional complexity of selecting the
Lie group. The method lends itself to implementation in parallel step size. Therefore, the step size adaptation is a critical aspect
architectures and it requires about flops [30] regardless of to be considered. We consider again the geodesic SD algorithms
the approximation order. It may not be the most efficient imple- and the nongeodesic SD algorithms, respectively. We show that
mentation in terms of flop count, but the algorithm has potential the geodesic methods may reduce the complexity of the step size
for highly parallel implementation. GPD algorithms based on adaptation.
splitting techniques have also been proposed in [48]. The cor- 1) The Geodesic SD Algorithms: Since the step size evolves
responding approximation is less complex than the one in [30] in a dyadic basis, the geodesic methods are very suitable for the
for the second and the third order. The second-order approxima- Armijo step. This is due to the fact that doubling the step size
tion requires only flops. This is the same amount of does not require any expensive computation, just squaring the
computation needed to perform the CT. Other efficient approx- rotation matrix as in the scaling and squaring procedure. For
imations in a Lie-algebraic setting have been considered in [49] normal matrices, the computation of the matrix exponential via
by using the technique of coordinates of the second kind (CSK). matrix squaring prevents the round-off error accumulation [32].
A second-order CSK approximant requires flops. An Armijo type of geodesic SD algorithm enjoys this benefit,
since the argument of the matrix exponential is skew-Hermitian.
Moreover, when the step size is halved, the corresponding rota-
B. Nongeodesic SD Algorithms on tion matrix may be available from the scaling and squaring pro-
cedure which is often combined with other methods for approx-
imating the matrix exponential. This allows reducing the com-
This category includes local parametrizations derived from a
plexity because the expensive operation may often be avoided.
projection operator which is used to map arbitrary matrices into
2) The Nongeodesic SD Algorithms: The nongeodesic
. The optimal projection and an approximation of it are
methods compute the update by projecting a matrix into .
considered.
Unfortunately, the Armijo step size adaptation is relatively
1) Optimal Projection: The projection that minimizes the
expensive in this case. The main reason is that the update
Euclidean distance between the arbitrary matrix and a matrix
and the one corresponding to the
may be computed in different ways. By using the
double step size do not have a
SVD the computation of the projection requires flops and
straightforward relationship as squaring the rotation matrix for
by using the procedure (18) it requires about flops.
the geodesic methods. Thus, the projection operation needs to
2) Approximation of the Optimal Projection: This method
be computed multiple times. Moreover, even keeping the step
is the most inexpensive approximation of the optimal projec-
size constant involves the computation of the projection twice
tion, being based on the QR-decomposition of the matrix .
It requires only the unitary matrix which is an orthonormal 2Only dominant terms are reported, i.e., O (n ) .
Authorized licensed use limited to: Indian Institute of Technology Indore. Downloaded on May 05,2023 at 14:46:32 UTC from IEEE Xplore. Restrictions apply.
1142 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO. 3, MARCH 2008
Authorized licensed use limited to: Indian Institute of Technology Indore. Downloaded on May 05,2023 at 14:46:32 UTC from IEEE Xplore. Restrictions apply.
ABRUDAN et al.: SD ALGORITHMS FOR OPTIMIZATION 1143
Authorized licensed use limited to: Indian Institute of Technology Indore. Downloaded on May 05,2023 at 14:46:32 UTC from IEEE Xplore. Restrictions apply.
1144 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO. 3, MARCH 2008
(W )
solution of the original JADE algorithm [3]. The performance measures are the
(W )
JADE criterion J (24), Amari distance d and the unitarity criterion
1 Fig. 6. The JADE criterion J (24), the Amari distance and the
1
(22) versus the iteration step. The Riemannian SD algorithm outperforms the
unitarity criterion (22) versus the iteration step. A comparison between
()
conventional methods.
different local parametrizations on U n : the geodesic algorithms (continuous
line) versus the nongeodesic algorithms (dashed line). For the geodesic
algorithms the exponential map is computed by using three different methods:
methods, i.e., the Euclidean SD with enforcing unitarity and Matlab’s diagonal Padé approximation with scaling and squaring (DPA SS), +
the extra-penalty method. The Euclidean SD and extra-penalty GPD-IZ [30], CT. For the nongeodesic algorithms the projection operation is
computed with two different methods: the OP via Matlab’s svd function and
method do not operate in an appropriate parameter space, and the approximation of it (AOP) via QR decomposition. The horizontal thick
the convergence speed is decreased. All SD algorithms satisfy dotted line in subplots (a) and (b) represents the solution of the original JADE
perfectly the unitary constraint, except for the extra-penalty algorithm [3].
method which achieves also the lowest convergence speed. This
is due to the fact that an optimum scalar weighting parameter
may not exist. The unitary matrix constraint is equivalent to (GPD-IZ) [30] and the CT. The nongeodesic SD algorithms are
smooth real Lagrangian constrains, therefore more parameters based on a projection type of local parametrization. The OP is
could be used. However, computing these parameters may computed via SVD [2] by using the Matlab svd function and
computationally expensive and/or non-trivial even in the case the approximation of the optimal projection (AOP) based on
of , like the example presented in Section II-B. The the QR algorithm is computed via modified Gram-Schmidt pro-
accuracy of the solution in terms of unitary constraint is shown cedure [54]. In terms of convergence speed, both the geodesic
in subplot c) of Fig. 5 considering the criterion (22) versus and the nongeodesic SD algorithms such as [2] and [19] have
the number of iterations. similar performance, regardless of the local parametrization,
We will next analyze how the choice of the local parametriza- as shown in subplots a) and b) of Fig. 6. Also in terms of
tion affects the performance of the SD algorithms. The results unitarity criterion, all algorithms provide good performance.
are compared in terms of convergence speed and the accuracy of The solution of the original JADE algorithm (represented by
satisfying the unitary constraint. Two classes of Riemannian SD the horizontal thick dotted line in Fig. 6) is achieved in less
algorithms are considered. The first one includes the geodesic than 10 iteration for all SD algorithms, regardless of the local
SD algorithms and the second one include the nongeodesic SD parametrization.
algorithms. For the geodesic SD algorithms the exponential In conclusion, the choice of the local parametrization in
map is computed by three different methods: the Matlab’s made according to the computational complexity and numer-
expm function which uses the diagonal Padé approximation ical stability. An Armijo step size rule is very suitable to the
with scaling and squaring [32], the General- geodesic algorithms, i.e., using the exponential map as a local
ized Polar Decomposition of order four by Iselres and Zanna parametrization. If implemented in parallel architecture the
Authorized licensed use limited to: Indian Institute of Technology Indore. Downloaded on May 05,2023 at 14:46:32 UTC from IEEE Xplore. Restrictions apply.
ABRUDAN et al.: SD ALGORITHMS FOR OPTIMIZATION 1145
Authorized licensed use limited to: Indian Institute of Technology Indore. Downloaded on May 05,2023 at 14:46:32 UTC from IEEE Xplore. Restrictions apply.
1146 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO. 3, MARCH 2008
[12] A. Hyvärinen, J. Karhunen, and E. Oja, Independent Component Anal- [39] R. W. Brockett, “Least squares matching problems,” Linear Algebra
ysis. New York: Wiley, 2001. Applicat., vol. 122/123/124, pp. 761–777, 1989.
[13] S. Fiori, A. Uncini, and F. Piazza, “Application of the MEC network [40] R. W. Brockett, “Dynamical systems that sort lists, diagonalize ma-
to principal component analysis and source separation,” in Proc. Int. trices, and solve linear programming problems,” Linear Algebra Ap-
Conf. Artif. Neural Netw., 1997, pp. 571–576. plicat., vol. 146, pp. 79–91, 1991.
[14] M. D. Plumbley, “Geometrical methods for non-negative ICA: Man- [41] B. Owren and B. Welfert, “The Newton iteration on Lie groups,” BIT
ifolds, Lie groups, toral subalgebras,” Neurocomput., vol. 67, pp. Numer. Math., vol. 40, pp. 121–145, Mar. 2000.
161–197, 2005. [42] P.-A. Absil, R. Mahony, and R. Sepulchre, “Riemannian geometry of
[15] A. Cichocki and S.-I. Amari, Adaptive Blind Signal and Image Pro- Grassmann manifolds with a view on algorithmic computation,” Acta
cessing. New York: Wiley, 2002. Applicandae Mathematicae, vol. 80, no. 2, pp. 199–220, 2004.
[16] L. Wang, J. Karhunen, and E. Oja, “A bigradient optimization approach [43] S. G. Krantz, Function Theory of Several Complex Variables, 2nd ed.
for robust PCA, MCA and source separation,” in Proc. IEEE Conf. Pacific Grove, CA: Wadsworth and Brooks/Cole Advanced Books and
Neural Netw., 27 Nov.–1 Dec. 1995, vol. 4, pp. 1684–1689. Software, 1992.
[17] S. C. Douglas, “Self-stabilized gradient algorithms for blind source [44] S. Smith, “Statistical resolution limits and the complexified
separation with orthogonality constraints,” IEEE Trans. Neural Netw., Cramér-Rao bound,” IEEE Trans. Signal Process., vol. 53, pp.
vol. 11, pp. 1490–1497, Nov. 2000. 1597–1609, May 2005.
[18] S.-I. Amari, “Natural gradient works efficiently in learning,” Neural [45] D. H. Brandwood, “A complex gradient operator and its applications
Comput., vol. 10, no. 2, pp. 251–276, 1998. in adaptive array theory,” in Inst. Elect. Eng. Proc., Parts F and H, Feb.
[19] M. Nikpour, J. H. Manton, and G. Hori, “Algorithms on the Stiefel 1983, vol. 130, pp. 11–16.
manifold for joint diagonalisation,” in Proc. IEEE Int. Conf. Acoust., [46] Y. Yang, “Optimization on Riemannian manifold,” in Proc. 38th Conf.
Speech Signal Process., 2002, vol. 2, pp. 1481–1484. Decision Contr., Phoenix, AZ, Dec. 1999, pp. 888–893.
[20] C. B. Papadias, “Globally convergent blind source separation based [47] N. J. Higham, “Matrix nearness problems and applications,” in Appli-
on a multiuser kurtosis maximization criterion,” IEEE Trans. Signal cations of Matrix Theory, M. J. C. Gover and S. Barnett, Eds. Oxford,
Process., vol. 48, pp. 3508–3519, Dec. 2000. U.K.: Oxford Univ. Press, 1989, pp. 1–27.
[21] P. Sansrimahachai, D. Ward, and A. Constantinides, “Multiple-input [48] A. Zanna and H. Z. Munthe-Kaas, “Generalized polar decomposition
multiple-output least-squares constant modulus algorithms,” in IEEE for the approximation of the matrix exponential,” SIAM J. Matrix Anal.,
Global Telecommun. Conf., Dec. 1–5, 2003, vol. 4, pp. 2084–2088. vol. 23, pp. 840–862, Jan. 2002.
[22] C. B. Papadias and A. M. Kuzminskiy, “Blind source separation with [49] E. Celledoni and A. Iserles, “Methods for approximation of a matrix
randomized Gram-Schmidt orthogonalization for short burst systems,” exponential in a Lie-algebraic setting,” IMA J. Numer. Anal., vol. 21,
in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., May 17–21, no. 2, pp. 463–488, 2001.
2004, vol. 5, pp. 809–812. [50] T. Abrudan, J. Eriksson, and V. Koivunen, “Optimization under unitary
[23] J. Lu, T. N. Davidson, and Z. Luo, “Blind separation of BPSK signals matrix constraint using approximate matrix exponential,” in Conf. Rec.
using Newton’s method on the Stiefel manifold,” in IEEE Int. Conf. 39th Asilomar Conf. Signals, Syst. Comp. 2005, 28 Oct.–1 Nov. 2005.
Acoust., Speech, Signal Process., Apr. 2003, vol. 4, pp. 301–304. [51] P. Sansrimahachai, D. Ward, and A. Constantinides, “Blind source sep-
[24] A. Edelman, T. Arias, and S. Smith, “The geometry of algorithms with aration for BLAST,” in Proc. 14th Int. Conf. Digital Signal Process.,
orthogonality constraints,” SIAM J. Matrix Analysis Applicat., vol. 20, Jul. 1–3, 2002, vol. 1, pp. 139–142.
no. 2, pp. 303–353, 1998. [52] J. H. Manton, “On the role of differential geometry in signal pro-
[25] S. Fiori, “Stiefel-Grassman Flow (SGF) learning: Further results,” in cessing,” in Int. Conf. Acoust., Speech Signal Process., Philadelphia,
Proc. IEEE-INNS-ENNS Int. Joint Conf. Neural Netw., Jul. 24–27, PA, Mar. 2005, vol. 5, pp. 1021–1024.
2000, vol. 3, pp. 343–348. [53] P. Absil and K. A. Gallivan, Joint diagonalization on the oblique man-
[26] J. H. Manton, R. Mahony, and Y. Hua, “The geometry of weighted ifold for independent component analysis DAMTP, Univ. Cambridge,
low-rank approximations,” IEEE Trans. Signal Process., vol. 51, pp. U.K., Tech. Rep. NA2006/01, 2006 [Online]. Available: http://www.
500–514, Feb. 2003. damtp.cam.ac.uk/user/na/reports.html
[27] S. Fiori, “Quasi-geodesic neural learning algorithms over the orthog- [54] G. H. Golub and C. van Loan, Matrix Computations, 3rd ed. Balti-
onal group: A tutorial,” J. Mach. Learn. Res., vol. 1, pp. 1–42, Apr. more, MD: The Johns Hopkins Univ. Press, 1996.
2005.
[28] D. G. Luenberger, “The gradient projection method along geodesics,”
Manage. Sci., vol. 18, pp. 620–631, 1972.
[29] D. Gabay, “Minimizing a differentiable function over a differential Traian E. Abrudan (S’02) received the M.Sc. de-
gree from the Technical University of Cluj-Napoca,
manifold,” J. Optim. Theory Applicat., vol. 37, pp. 177–219, Jun. 1982.
Romania, in 2000.
[30] A. Iserles and A. Zanna, “Efficient computation of the matrix exponen-
Since 2001, he has been with the Signal Pro-
tial by general polar decomposition,” SIAM J. Numer. Anal., vol. 42, cessing Laboratory, Helsinki University of Tech-
pp. 2218–2256, Mar. 2005. nology (HUT), Finland. He is a Ph.D. student with
[31] I. Yamada and T. Ezaki, “An orthogonal matrix optimization by dual the Electrical and Communications Engineering De-
Cayley parametrization technique,” in Proc. ICA, 2003, pp. 35–40. partment, HUT. Since 2005, he has been a member
[32] C. Moler and C. van Loan, “Nineteen dubious ways to compute the of GETA, Graduate School in Electronics, Telecom-
exponential of a matrix, twenty-five years later,” SIAM Rev., vol. 45, munications and Automation. His current research
no. 1, pp. 3–49, 2003. interests include statistical signal processing and
[33] S. Douglas and S.-Y. Kung, “An ordered-rotation kuicnet algorithm optimization algorithms for wireless communications with emphasis on MIMO
for separating arbitrarily-distributed sources,” in Proc. IEEE Int. Conf. and multicarrier systems.
Independ. Compon. Anal. Signal Separat., Aussois, France, Jan. 1999,
pp. 419–425.
[34] C. Udriste, Convex Functions and Optimization Methods on Rie-
mannian Manifolds. Mathematics and Its Applications. Boston, Jan Eriksson (M’04) received the M.Sc. degree in
MA: Kluwer Academic, 1994. mathematics from University of Turku, Finland, in
[35] M. P. do Carmo, Riemannian Geometry. Mathematics: Theory and Ap- 2000, and the D.Sc.(Tech) degree (with honors) in
plications. Boston, MA: Birkhauser, 1992. signal processing from Helsinki University of Tech-
[36] A. Knapp, Lie Groups Beyond an Introduction, Vol. 140 of Progress in nology (HUT), Finland, in 2004.
Mathematics. Boston, MA: Birkhauser, 1996. Since 2005, he has been working as a postdoc-
[37] D. G. Luenberger, Linear and Nonlinear Programming. Reading, toral researcher with the Academy of Finland. His
MA: Addison-Wesley, 1984. research interests are in blind signal processing, sto-
[38] S. T. Smith, “Optimization techniques on Riemannian manifolds,” chastic modeling, constrained optimization, digital
Fields Inst. Commun., Amer. Math. Soc., vol. 3, pp. 113–136, 1994. communication, and information theory.
Authorized licensed use limited to: Indian Institute of Technology Indore. Downloaded on May 05,2023 at 14:46:32 UTC from IEEE Xplore. Restrictions apply.
ABRUDAN et al.: SD ALGORITHMS FOR OPTIMIZATION 1147
Visa Koivunen (S’87–M’93–SM’98) received the munications Engineering nominated by the Academy of Finland. Since 2003,
D.Sc. (Tech) degree with honors from the Depart- he has also been an adjunct full professor with the University of Pennsylvania.
ment of Electrical Engineering, University of Oulu, During his sabbatical leave (2006–2007), he was the Nokia Visiting Fellow at
Finland. He received the primus doctor (best grad- the Nokia Research Center, as well as a Visiting Fellow at Princeton Univer-
uate) award among the doctoral graduates during sity, Princeton, NJ. His research interests include statistical, communications,
1989–1994. and sensor array signal processing. He has published more than 200 papers in
From 1992 to 1995, he was a visiting researcher international scientific conferences and journals.
at the University of Pennsylvania, Philadelphia. In Dr. Koivunen coauthored the papers receiving the Best Paper award in IEEE
1996, he held a faculty position with the Department PIMRC 2005, EUSIPCO 2006, and EuCAP 2006. He served as an Associate
of Electrical Engineering, University of Oulu. From Editor for IEEE SIGNAL PROCESSING LETTERS. He is a member of the editorial
August 1997 to August 1999, he was an Associate board for the Signal Processing journal and the Journal of Wireless Communi-
Professor with the Signal Processing Labroratory, Tampere University of Tech- cation and Networking. He is also a member of the IEEE Signal Processing for
nology, Finland. Since 1999 he has been a Professor of Signal Processing with Communication Technical Committee (SPCOM-TC). He was the general chair
the Department of Electrical and Communications Engineering, Helsinki Uni- of the IEEE SPAWC (Signal Processing Advances in Wireless Communication)
versity of Technology (HUT), Finland. He is one of the Principal Investigators in conference held in Helsinki, June 2007.
Smart and Novel Radios (SMARAD) Center of Excellence in Radio and Com-
Authorized licensed use limited to: Indian Institute of Technology Indore. Downloaded on May 05,2023 at 14:46:32 UTC from IEEE Xplore. Restrictions apply.