0% found this document useful (0 votes)
61 views14 pages

Steepest Descent Algorithms For Optimization Under Unitary Matrix Constraint PDF

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views14 pages

Steepest Descent Algorithms For Optimization Under Unitary Matrix Constraint PDF

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

1134 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO.

3, MARCH 2008

Steepest Descent Algorithms for Optimization Under


Unitary Matrix Constraint
Traian E. Abrudan, Student Member, IEEE, Jan Eriksson, Member, IEEE, and Visa Koivunen, Senior Member, IEEE

Abstract—In many engineering applications we deal with con- the space of matrices using a classical steepest descent
strained optimization problems with respect to complex-valued (SD) algorithm. Separate orthogonalization procedure must
matrices. This paper proposes a Riemannian geometry ap- be applied after each iteration [12], [20]–[22]. Approaches
proach for optimization of a real-valued cost function of
complex-valued matrix argument W, under the constraint that stemming from the Lagrange multipliers method have also
W is an unitary matrix. We derive steepest descent been used to solve such problems [16]. In such approaches,
(SD) algorithms on the Lie group of unitary matrices ( ). the error criterion contains an extra term that penalizes for
The proposed algorithms move towards the optimum along the the deviations from orthogonality property. Self-stabilized
geodesics, but other alternatives are also considered. We also
address the computational complexity and the numerical stability
algorithms have been developed to provide more accurate, but
issues considering both the geodesic and the nongeodesic SD still approximate solutions [17]. Major improvements over the
algorithms. Armijo step size [1] adaptation rule is used similarly classical methods are obtained by taking into account the geo-
to [2], but with reduced complexity. The theoretical results are metrical aspects of the optimization problem. Pioneering work
validated by computer simulations. The proposed algorithms are by Luenberger [28] and Gabay [29] convert the constrained
applied to blind source separation in MIMO systems by using the
joint diagonalization approach [3]. We show that the proposed optimization problem into an unconstrained problem, on an
algorithms outperform other widely used algorithms. appropriate differentiable manifold. An extensive treatment of
optimization algorithms with orthogonality constraints is given
Index Terms—Array processing, optimization, source separa-
tion, subspace estimation, unitary matrix constraint. later by Edelman et al. [24] in a Riemannian geometry context.
A non-Riemannian approach has been proposed in [2], which
is a general framework for optimization under unitary matrix
I. INTRODUCTION constraint. A more-detailed literature review is presented in
Section II.
ONSTRAINED optimization problems arise in many
C signal processing applications. One common task is to
minimize a cost function with respect to a matrix, under the con-
In this paper we derive two generic algorithms stemming
from differential geometry. They optimize a real-valued cost
function with respect to complex-valued matrix sat-
straint that the matrix has orthonormal columns. Some typical isfying , i.e., perform optimization sub-
applications in communications and array signal processing are ject to the unitary matrix constraint. SD algorithms operating
subspace tracking [4]–[6], blind and constrained beamforming on the Lie group of unitary matrices are proposed. They
[7]–[9], high-resolution direction finding (e.g., MUSIC and move towards the optimum along the locally shortest paths, i.e.,
ESPRIT), and generally all subspace-based methods. Another geodesics. Geodesics on Riemannian manifolds correspond to
straightforward application is the independent component anal- the straight lines in Euclidean space. Our motivation to opt for
ysis (ICA), [3], [10]–[19]. This type of optimization problem the geodesic algorithms is that on the Lie group of unitary ma-
has also been considered in the context of multiple-input mul- trices , the geodesics have simple expressions described by
tiple-output (MIMO) communication systems [6], [20]–[23]. the exponential map. We can fully exploit recent developments
Most of the existing optimization algorithms are derived for the in computing the matrix exponential needed in the multiplica-
real-valued case and orthogonal matrices [10], [11], [13]–[15], tive update on . The generalized polar decomposition [30]
[17], [24]–[27]. Very often in communications and signal pro- proves to be one of the most computationally efficient method if
cessing applications we are dealing with complex matrices and implemented in a parallel architecture, or the Cayley transform
signals. Consequently, the optimization need to be performed (CT) [31] otherwise. We also consider other parametrizations
under unitary matrix constraint. proposed in the literature and show that all these parametriza-
Commonly optimization algorithms employing orthog- tions are numerically equivalent up to a certain approximation
onal/unitary matrix constraint minimize a cost function on order. However, the algorithms differ in terms of computational
complexity, which is also addressed in this paper. The proposed
Manuscript received January 30, 2007; revised August 13, 2007. The as- generic geodesic algorithms, unlike other parametrizations, can
sociate editor coordinating the review of this manuscript and approving it for be relatively easily adapted to different problems with varying
publication was Dr. Sergiy Vorobyov. This work was supported in part by the
Academy of Finland and by the GETA Graduate School. complexity and strictness of the unitarity property requirements.
The authors are with the SMARAD CoE, Signal Processing Laboratory, De- This is due the fact that the computation of the matrix exponen-
partment of Electrical Engineering, Helsinki University of Technology, FIN- tial function employed in the proposed algorithms is a well-re-
02015 HUT, Finland (e-mail: [email protected]; [email protected];
[email protected]). searched problem [32] with some recent progress relevant to
Digital Object Identifier 10.1109/TSP.2007.908999 the unitary optimization [30]. Moreover, we show that the expo-
1053-587X/$25.00 © 2008 IEEE

Authorized licensed use limited to: Indian Institute of Technology Indore. Downloaded on May 05,2023 at 14:46:32 UTC from IEEE Xplore. Restrictions apply.
ABRUDAN et al.: SD ALGORITHMS FOR OPTIMIZATION 1135

nential map is well suited for adapting the step size for the SD we depart from the constrained surface in each step. This hap-
method on the unitary group. pens because a Riemannian manifold is a “curved space.” The
This paper is organized as follows. In Section II, an overview locally length-minimizing curve between two points on the Rie-
of the problem of optimization under unitary matrix constraint is mannian manifold is called a geodesic and it is not a straight
provided. A brief review of different approaches presented in the line like on the Euclidean space. Several authors [10], [11], [13],
literature is given as well. A simple geometric example is used [14], [24], [25], [28], [29], [34], [37], [38] have proposed that the
to illustrate the differences among various approaches. In Sec- search for the optimum should proceed along the geodesics of
tion III, we derive the Riemannian gradient on the Lie group of the constrained surface. Relevant work in Riemannian optimiza-
unitary matrices and the corresponding SD algorithms. Equiva- tion algorithms may be found in [24], [29], [34], and [38]–[41].
lence relationships between the proposed algorithms and other Algorithms considering the real-valued Stiefel and/or Grass-
algorithms are established in Section IV. The computational mann manifolds have been proposed in [10], [11], [15], [17],
complexity and the numerical stability issues are studied in Sec- [24]–[26], and [42]. Edelman et al. [24] consider the problem of
tions V and VI, respectively. Simulation results are presented in optimization under orthonormal constraints. They propose SD,
Section VII. The proposed algorithms are used to solve the uni- conjugate gradient, and Newton algorithms along geodesics on
tary matrix optimization problem encountered in the joint ap- Stiefel and Grassman manifolds.
proximate diagonalization of eigenmatrices (JADE) algorithm A general framework for optimization under unitary matrix
[3] which is applied for blind source separation in a MIMO constraints is presented in [2]. It is not following the traditional
system. Finally, Section VIII concludes the paper. Riemannian optimization approach. A modified SD algorithm,
coupled with Armijo’s step size adaptation rule [1] and a
II. OPTIMIZATION UNDER UNITARY MATRIX CONSTRAINT modified Newton algorithm are proposed for optimization on
both the complex Stiefel and the complex Grassmann mani-
In this section, a brief overview of optimization methods fold. These algorithms do not employ a geodesic motion, but
under orthonormal or unitary matrix constraint is provided. Dif- geodesic motion could be used in the general framework. A
ferent approaches are reviewed and the key properties of each local parametrization based on an Euclidean projection of the
approach are briefly studied. A simple example is presented to tangent space onto the manifold is used in [2]. Hence, the
illustrate how each algorithm searches for the optimum. computational cost may be reduced. Moreover, it is suggested
that the geodesic motion is not the only solution, since there
A. Overview is no direct connection between the Riemannian geometry of
Most of classical optimization methods with unitary matrix the Stiefel (or Grassmann) manifold (i.e., the “constrained
constraint operate on the Euclidean space by using a SD algo- surface”) and an arbitrary cost function.
rithm. The unitary property of the matrix is lost in every itera- The SD algorithms proposed in this paper operate on the
tion, and it needs to be restored in each step. Moreover, the con- Lie group of unitary matrices . We have derived the
vergence speed is reduced. Other algorithms use a Lagrangian Riemannian gradient needed in the optimization on . We
type of optimization, by adding an extra-penalty function which choose to follow a geodesic motion. This is justified by the
penalizes for the deviation from unitarity [16]. These methods desirable property of that the right multiplication is an
suffer from slow convergence and find only an approximate isometry with respect to the canonical bi-invariant metric [35].
solution in terms of orthonormality. Self-stabilized algorithms This allows us to translate the descent direction at any point
provide more accurate solutions [17], [33]. in the group to the identity element and exploit the fact that
A major drawback of the classical Euclidean SD and La- the tangent space at identity is the Lie algebra of skew-Hermi-
grange type of algorithms [12], [16], [20]–[22] is that they do tian matrices. This leads to lower computational complexity
not take into account the special structure of the parameter space because the argument of the matrix exponential operation is
where the cost function needs to be optimized. The constrained skew-Hermitian. Novel methods for computing the matrix
optimization problem may be formulated as an unconstrained exponential operation for skew-symmetric matrices recently
one in a different parameter space called manifold. Therefore, proposed in [30] and [32] may be exploited. Moreover, we
the space of unitary matrices is considered to be a “constrained show that using an adaptive step size according to Armijo’s
surface.” Optimizing a cost function on a manifold is often con- rule [1] fits very well to the proposed algorithms.
sidered [10], [14], [24], [25], [29], [34] as a problem of Rie-
mannian geometry [35]. Algorithms more general than the tra- B. Illustrative Example
ditional Riemannian approach are considered in [2]. We present a rather simple simulation example, in order to il-
The second important aspect neglected in classical algorithms lustrate how different algorithms operate under the unitary con-
is that the unitary matrices are algebraically closed under straint. We consider the Lie group of unit-norm complex num-
the multiplication operation, not under addition. Therefore, they bers , which are the 1 1 unitary matrices. The unitary
form a group under the multiplication operation, which is the constraint is in this case the unit circle. We minimize the cost
Lie group of unitary matrices, [36]. Consequently, function , subject to . Five different
by using an iterative algorithm based on an additive update the algorithms are considered. The first one is the unconstrained SD
unitarity property is lost after each iteration. Even though we algorithm on the Euclidean space, with the corresponding up-
are moving along a straight line pointing in the right direction, date , where is the step size. The

Authorized licensed use limited to: Indian Institute of Technology Indore. Downloaded on May 05,2023 at 14:46:32 UTC from IEEE Xplore. Restrictions apply.
1136 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO. 3, MARCH 2008

point on the unit circle in terms of Euclidean norm. The pro-


posed SD algorithm uses a multiplicative update which
is a phase rotation. The phase is proportional to the imaginary
part of the complex number associated with the point. For this
reason, the constraint is satisfied at every step in a natural way.
Although this low-dimensional example is rather trivial, it has
been included for illustrative purposes. In the case of multidi-
mensional unitary matrices, a similar behavior is encountered.

III. ALGORITHM DERIVATION


In this section, we derive two generic SD algorithms on the
Lie group of unitary matrices. Consider a real-valued cost func-
tion of a complex matrix , i.e., . Our
goal is to minimize (or maximize) the function
under the constraint that , i.e., is uni-
tary. We proceed as follows. First, in Section III-A we describe
the Lie group of unitary matrices, which is a real
Fig. 1. Minimization of a cost function on the unit circle U (1). Euclidean differentiable manifold. Moreover, we describe the real differ-
versus Riemannian SD methods. entiation of functions defined in complex spaces in a way which
is suitable for the optimization. In Section III-B, we introduce
the Riemannian metric on the Lie group . The definition of
second one is the same SD with enforcing the unit norm con- the gradient on the Riemannian space is intimately related to this
straint, , after each iteration. The third metric. The Riemannian gradient is derived in Section III-C, and
method is similar to the bigradient method [16] derived from a basic generic optimization algorithm is given in Section III-D.
the Lagrange multiplier method. An extra-penalty Finally, a Riemannian SD algorithm with an adaptive step size
weighted by a parameter is added to the original cost function is given in Section III-E.
in order to penalize the deviation from the unit norm. In
this case, . The A. Differentiation of Functions Defined in Complex Spaces
fourth SD algorithm operates on the right parameter space de- A Lie group is defined to be a differentiable manifold with a
termined by the constraint. At each point the algorithm takes smooth, i.e., differentiable group structure [36]. The Lie group
a direction tangent to the unit circle and the resulting point is of unitary matrices is a real differentiable manifold be-
projected back to the unit circle. The corresponding update is cause it is endowed with a real differentiable structure. There-
, where fore, we deal with a real-valued cost function essentially defined
is the step size and is the operator which projects an arbi- in a real parameter space. However, since the algebraic proper-
trary point to the closest point on the unit circle in terms of Eu- ties of the group are defined in terms of the complex field, it is
clidean norm. The fifth algorithm is a multiplicative update SD convenient to operate directly with complex representation of
algorithm derived in this paper. The corresponding update is a the matrices instead of using separately their real and the imagi-
rotation, i.e., , where nary parts, i.e., without using reals for representing the complex
and is the imaginary part of . The parameter rep- space [43], [44]. Now the real differentiation can be described
resents the step size. The starting point is for by a pair of complex-valued operators defined in terms of real
all the algorithms (see Fig. 1). The point , sets differentials with respect to real and imaginary parts [43], [45]
the cost function to its minimum , but this is an
undesired minimum because it does not satisfy the constraint.
The desired optimum is , where the constraint is
satisfied and . We may notice in Fig. 1 that and
the unconstrained SD (marked by ) takes the SD direction in
, and goes straight to the undesired minimum. By enforcing
the unit norm constraint, we project radially the current point (1)
on the unit circle . The enforcing is necessary at every itera-
tion in order to avoid the undesired minimum. The extra-penalty with and . If a function is holo-
SD algorithm follows the unconstrained SD in the first it- morphic (analytical), the first differential operator in (1) coin-
eration, since initially the extra-penalty term is equal to zero. cides with the complex differential and the second one is iden-
It converges somewhere between the desired and the undesired tically zero (Cauchy-Riemann equations). It should be noted
minimum. that a real-valued function is holomorphic only if it is a con-
The SD algorithm [2] on the space determined by the con- stant. Therefore, the complex analyticity is irrelevant to opti-
straint takes in this case the SD direction on , tangent mization problems. The above representation is more compact,
to the unit circle. The resulting point is projected to the closest allows differentiation of complex argument functions without

Authorized licensed use limited to: Indian Institute of Technology Indore. Downloaded on May 05,2023 at 14:46:32 UTC from IEEE Xplore. Restrictions apply.
ABRUDAN et al.: SD ALGORITHMS FOR OPTIMIZATION 1137

using reals for the representation, and it is appropriate for many


applications [45].

B. Riemannian Structure on
A differentiable function represents a
curve on the smooth manifold (see Fig. 2). Let
and let be the set of functions on that are differentiable
at . The tangent vector to the curve at
given by
is a function W , and a tangent vector X
Fig. 2. Illustrative example representing the tangent space
2 T U (n ) .
T U (n) at point

(2) satisfying for all


the condition
A tangent vector at is the tangent vector at of
some curve with . All the tangent vectors at a (6)
point form the tangent space . The tangent
space is a real vector space attached to every point in the differ- is the gradient on evaluated at . The direction
ential manifold. It should be noted that the value is defined in (1) represents the steepest ascent
independent of the choice of local coordinates (chart) and inde- direction of the cost function of complex argument on the
pendent of the curve as long as and Euclidean space at a given [45]. The left-hand side (LHS) in
[35]. Since the curve , we have . (6) represents an inner product in the ambient space, whereas the
Differentiating both sides with respect to , the tangent space right-hand side (RHS) represents a Riemannian inner product at
at may be identified with the -dimensional real . Equation (6) may be written as
vector space (i.e., it is a vector space isomorphic to)
(7)
(3)
Equation (7) shows that the difference , is
From (3), it follows that the tangent space of at the group orthogonal to all . Therefore, it lies in the normal
identity is the real Lie algebra of skew-Hermitian matrices space (5), i.e.,
. We empha-
size that is not a complex vector space (Lie algebra), (8)
because the skew-Hermitian matrices are not closed under
multiplication with complex scalars. For example, if is a where the matrix is a Hermitian matrix determined by im-
skew-Hermitian matrix, then is Hermitian. Let and be posing the condition that . From (3) it follows
two tangent vectors, i.e., . The inner product that:
in is given by
(9)
(4)
From (8) and (9), we get the expression for the Hermitian matrix
This inner product induces a bi-invariant Riemannian metric . The gradient of the cost func-
on the Lie group [35]. We may define the normal space at tion on the Lie group of unitary matrices at may be written
considering that is embedded in the ambient space by using (8) as follows:
, equipped with the Euclidean metric. The normal space
is the orthogonal complement of the tangent space (10)
with respect to the metric of the ambient space [24],
i.e., for any and , we have
D. Moving Towards the SD Direction in
. It follows that the normal
space at is given as Here, we introduce a generic Riemannian SD algorithm along
geodesics on the Lie group of unitary matrices . A geodesic
(5) curve on a Riemannian manifold is defined as a curve for
which the second derivative is zero or it lies in the normal
space for all (i.e., the acceleration vector stays normal to the
direction of motion as long as the curve is traced with constant
C. The SD Direction on the Riemannian Space
speed). Locally the geodesics minimize the path length with re-
We consider a differentiable cost function . spect to the Riemannian metric ([35, p. 67]). A geodesic ema-
Intuitively, the SD direction is defined as “the direction nating from the identity with a velocity is characterized by
where the cost function decreases the fastest per unit length.” the exponential map:
Having the Riemannian metric, we are now able to derive the
Riemannian gradient we are interested in. A tangent vector (11)

Authorized licensed use limited to: Indian Institute of Technology Indore. Downloaded on May 05,2023 at 14:46:32 UTC from IEEE Xplore. Restrictions apply.
1138 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO. 3, MARCH 2008

Fig. 3. The geodesic emanating from identity in the direction of 0G ending at P = exp( 0G ).

The exponential of complex matrices is given by the con- TABLE I


vergent power series . The opti- THE BASIC RIEMANNIAN SD ALGORITHM ON U (n)
mization of is carried out along geodesics on the con-
straint surface. For the optimization we need the equation of the
geodesic emanating from . This may be found by
taking into account the fact that the right translation in is
an isometry with respect to the metric given by (4) and an isom-
etry maps geodesics to geodesics [10], [35]. Therefore,
. It follows that the geodesic emanating from is
, i.e.,

(12) The algorithm is summarized in Table I. Practical algorithms


require the computation of the exponential map, which is ad-
Consequently, we need to translate the gradient of the cost func- dressed in Section V.
tion at (10) to identity, i.e., into the Lie algebra . Since
the differential of the right translation is a vector space isomor- E. A Self-Tuning Riemannian SD Algorithm on
phism, this is performed simply by postmultiplying An optimal value of the step size is difficult to determine
by , i.e., in practice. Moreover, it is cost function dependent and the ap-
propriate step size may change at each iteration. The SD algo-
rithm with a fixed small step size converges in general close to
(13) a local minimum. It trades off between high convergence speed,
which requires large step size, and low steady-state error, which
We have to keep in mind that this is not the Riemannian gradient requires a small step size. An adaptive step size is often a desir-
of the cost function evaluated at identity. The tangent vector able choice.
is the Riemannian gradient of the cost function evaluated In [27], a projection algorithm is considered together with
at and translated to identity. Note that the argument of the three other optimization alternatives along geodesics in the real
matrix exponential operation is skew-Hermitian. We exploit this case, i.e., on the orthogonal group. The first geodesic algorithm
very important property later in this paper in order to reduce the in [27] uses a fixed step size, which leads to the “real-valued
computational complexity. counterpart” of the algorithm in Table I. The second one is
The cost function may be minimized iteratively by a geodesic search for computing the step size in the update
using a geodesic motion. Typically we start at . equation. If the geodesic search is performed in a continuous
We choose the direction in to be the negative direction domain [39], [40], it is computationally very expensive since
of the gradient, i.e., (13). Moving from it involves differential equations. A discretized version of the
to , is equivalent to moving geodesic search may be employed. Two such methods are re-
from to , as it is shown in Fig. 3. The geodesic motion viewed in [27]. The third alternative is a stochastic type of algo-
in corresponds to the multiplication by a rotation matrix rithm which adds perturbation to the search direction.
. The parameter controls the magni- We opt for a fourth alternative based on the Armijo step size
tude of the tangent vector and consequently the algorithm con- [1]. It allows reducing the computational complexity and gives
vergence speed. The update corresponding to the SD algorithm the optimal local performance. This type of algorithm takes an
along geodesics on is given by initial step along the geodesic. Then, two other possibilities are
checked by evaluating the cost function for the case of doubling
(14) or halving the step size. The doubling or halving step continues

Authorized licensed use limited to: Indian Institute of Technology Indore. Downloaded on May 05,2023 at 14:46:32 UTC from IEEE Xplore. Restrictions apply.
ABRUDAN et al.: SD ALGORITHMS FOR OPTIMIZATION 1139

TABLE II . Therefore, when doubling the step size,


THE SELF-TUNING RIEMANNIAN SD ALGORITHM ON U (n) instead of computing a new matrix exponential only
a matrix squaring operation is needed. It is important
to mention that the squaring operation is a very stable
operation [32] being also used in software packages for
computing the matrix exponential.
• Steps 6 and 7—Step size evaluation: In every iteration we
check if the step size is in the appropriate range determined
by the two inequalities. The step size evolves in a dyadic
basis. If it is too small it will be doubled and if it is too high
it will be halved.
• Step 8—Update: The new update is obtained in a multi-
plicative manner and a new iteration is started with step 2
if the residual error is not sufficiently small.
Remark: The SD algorithm in Table II may be easily con-
verted into a steepest ascent algorithm. The only difference is
as long as the step size is out of a range whose limits are set by that the step size would be negative and the inequalities in
two inequalities. It is known that in a stationary scenario (i.e., steps 6 and 7 need to be reversed.
the matrices involved in the cost function are time invariant)
the SD algorithm together with the Armijo step size rule [1] al- IV. RELATIONS AMONG DIFFERENT LOCAL
most always converges to a local minimum if not initialized at PARAMETRIZATIONS ON THE UNITARY GROUP
a stationary point. The convergence properties of the geodesic The proposed SD algorithms search for the minimum by
SD algorithm using the Armijo rule have been established in moving along geodesics, i.e., the local parametrization is the
[29], [46] for general Riemannian manifolds, provided that the exponential map. Other local parametrizations used to describe
cost function is continuously differentiable and has bounded a small neighborhood of a point in the group have been pro-
level sets. The first condition is an underlying assumption in posed in [2] and [11]. In this section, we establish equivalence
this paper and the second one is ensured by the compactness relationships among some different local parametrizations of
of . the unitary group . The first one is the exponential map
In [2], a SD algorithm is coupled with the Armijo rule for op- used in the proposed SD algorithms, the second one is the
timizing the step size. Geodesic motion is not used. Neverthe- Cayley transform [31], the third one is the Euclidean projection
less, in the general framework proposed in [2] it could be used. operator [2], and the fourth one is a parametrization based
We show that by using the Armijo rule together with the generic on the QR-decomposition. The four parametrizations lead to
SD algorithm along geodesics, the computational complexity is different update rules for the basic SD algorithm on . The
reduced by exploiting the properties of the exponential map, as update expressions may be described in terms of Taylor series
it will be shown later. expansion, and we prove the equivalence among all of them up
The generic SD algorithm with adaptive step size selection is to a certain approximation order.
summarized in Table II. The choice for computing the matrix
A. The Exponential Map
exponential is explained in Section V.
Algorithm Description: The algorithm consists of the fol- The rotation matrix used in the update expression (14)
lowing steps. of the proposed algorithm can be expressed as a Taylor series
• Step 1—Initialization: A typical initial value is . expansion of the matrix exponential, i.e.,
If the gradient , then the identity element is a . The update is
stationary point. In that case a different initial value equivalent to
may be chosen.
• Steps 2–3—Gradient computation: The Euclidean gradient
and Riemannian gradient are computed. (15)
• Step 4—Setting the threshold for the final error: Evaluate
the squared norm of the Riemannian gradient in B. The Cayley Transform
order to check if we are sufficiently close to the minimum If the rotation matrix in the update is computed by using
of the cost function. The residual error may be set to a the CT [31] instead of the matrix exponential, then
value closest to the smallest value available in the limited- . The corresponding Taylor series is
precision environment, or the highest value which can be . For the update equation is
tolerated in task at hand.
• Step 5—Rotation matrix computation: This step requires
the computation of the rotation matrix . (16)
The rotation matrix may be com- Obviously, (16) is equivalent to (15) up to the second order.
puted just by squaring , because Notice also that for the CT equals the first order

Authorized licensed use limited to: Indian Institute of Technology Indore. Downloaded on May 05,2023 at 14:46:32 UTC from IEEE Xplore. Restrictions apply.
1140 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO. 3, MARCH 2008

diagonal Padé approximation of (i.e., , The equivalence up to the first order between the update (20) and
see [32]). the other update expressions (15), (16), and (19) is obtained by
expanding the columns of the matrix from the Gram-
C. The Euclidean Projection Map Schmidt process separately in a Taylor series (proof available
Another possibility is to use an Euclidean projection map as on request)
a local parametrization as in [2]. This map projects an arbi-
(21)
trary matrix onto at a point which is the
closest point to in terms of Euclidean norm, i.e.,
The equivalence may extend to higher orders, but this remains
. The unitary matrix
to be studied.
minimizing the above norm can be obtained from the polar de-
composition of as in [47]
V. COMPUTATIONAL COMPLEXITY
(17) In this section, we evaluate the computational complexity of
the SD algorithms on by considering separately the the
where and are the left and the right singular vectors of
geodesic and the nongeodesic SD algorithms. The proposed
, respectively. Equivalently
geodesic SD algorithms use the rotational update of form
, and the rotation matrix is computed
(18)
via matrix exponential. We review the variety of algorithms
Equation (18) is also known as “the symmetric orthogonaliza- available in the literature for calculating the matrix exponential
tion” procedure [12]. In [2], the local parametrizations are more in the proposed algorithms. Details are given for the matrix
general in the sense that they are chosen for the Stiefel and exponential algorithms with the most appealing properties. The
Grassmann manifolds. The projection operation is computed via nongeodesic SD algorithms are based on an update expression
SVD as in (17). For the SD algorithm on the Stiefel manifold [2], of form , and the computational
the update is of form , where is the complexity of different cases is described. The cost of adapting
SD direction on the manifold and is the step size. According to to the step size using the Armijo rule is also evaluated. The SD
(18) the update is equivalent to method on involves overall complexity of flops1
. By expanding the above expression per iteration. Algorithms like conjugate gradient or Newton
in a Taylor series we get algorithm are expected to provide a faster convergence, but also
their complexity is expected to be higher. Moreover, a Newton
algorithm is more likely to converge to stationary points other
than local minima.
(19)
Considering that and , for A. Geodesic SD Algorithms on
the three update expressions (15), (16), and (19) become equiv-
alent up to the second order. In general, the geodesic motion on manifolds is computation-
ally expensive. In the case of , the complexity is reduced
D. The Projection Based on the QR-Decomposition even though it requires the computation of the matrix exponen-
tial. We are interested in the special case of matrix exponential
A computationally inexpensive way to approximate the op- of form , where and is a skew-Hermi-
timal projection of an arbitrary matrix onto (18) is the tian matrix. Obviously, finding the exponential of a skew-Her-
QR-decomposition. We show that this is not the optimal pro- mitian matrix has lower complexity than of general matrix. The
jection in terms of minimum Euclidean distance, but is accu- matrix exponential operation maps the skew-Hermitian matrices
rate enough to be used in practical applications. We establish from the Lie algebra into unitary matrices which reside on
a connection between the QR-based projection and the optimal the Lie group . Several alternatives for approximating the
projection. Let us consider the QR-decomposition of the arbi- matrix exponential have been proposed in [30], [32], [48], and
trary nonsingular matrix given by [49]. In general the term “approximation” may refer to two dif-
where is an upper triangular matrix and is a unitary ma- ferent things. The first kind of approximation maps the elements
trix. The unitary matrix is an approximation of the optimal of the Lie algebra exactly into the Lie group, and the approxi-
projection of onto , i.e., . The connec- mation takes place only in terms of deviation “within the con-
tion between this projection and the optimal projection can be strained surface.” Among the most efficient methods from this
established by using the polar decomposition of the upper-trian- category are: the diagonal Padé approximation [32], General-
gular matrix . We obtain , ized Polar Decomposition [30], [48], technique of coordinates
where and . There- of the second kind [49]. The second category includes methods
fore, the matrix is an approximation of the optimal pro- for which the resulting elements do not reside on the group any-
jection and it includes an additional rotation from more, i.e., . The most popular methods belonging to
. The update of the SD algorithm is equal to the unitary
1One “flop” is defined as a complex addition or a complex multiplication. An
factor from the QR-decomposition. In other words, if we have
+
operation of form ab c; a; b; c 2 is equivalent to two flops. A simple mul-
, then 2 2
tiplication of two n n matrices requires n flops. This is a quick evaluation
of the computational complexity, not necessarily proportional to the computa-
(20) tional speed.

Authorized licensed use limited to: Indian Institute of Technology Indore. Downloaded on May 05,2023 at 14:46:32 UTC from IEEE Xplore. Restrictions apply.
ABRUDAN et al.: SD ALGORITHMS FOR OPTIMIZATION 1141

this category is the truncated Taylor series and the nondiagonal TABLE III
Padé approximation. They do not preserve the algebraic proper- THE COMPLEXITY (IN FLOPS) OF COMPUTING THE LOCAL PARAMETRIZATION
ties, but they still provide reasonable performance in some ap- IN U (n)
plications [50]. Their accuracy may be improved by using them
together with the scaling and squaring procedure.
1) Padé Approximation of the Matrix Exponential: This is
[32, Method 2], and together with scaling and squaring [32,
Method 3] is considered to be one of the most efficient methods
for approximating a matrix exponential. For normal matrices basis in the range space of . This can be done by using House-
(i.e., matrices which satisfy ), the Padé holder reflections, Givens rotations or the Gram-Schmidt proce-
approximation prevents the round-off error accumulation. The dure [32]. The most computationally efficient and numerically
skew-Hermitian matrices are normal matrices, therefore, they stable approach is the modified Gram-Schmidt procedure which
enjoy this benefit. Because we deal with a SD algorithm on requires only flops.
we are also concerned about preserving the Lie algebraic In Table III, we summarize the complexity2 of computing
properties. The diagonal Padé approximation preserves the the local parametrizations for the geodesic and the nongeodesic
unitarity property accurately. The Padé approximation together methods, respectively. The geodesic methods include: the diag-
with scaling and squaring supposes the choice of the Padé onal Padé approximation with scaling and squaring
approximation order and the scaling and squaring exponent of type (1, 0) [32] (CT), the Generalized Polar Decom-
to get the best approximant given the approximation accuracy. position with reduction to tridiagonal form (GPD-IZ) [30] and
See [32] for information of choosing the pair optimally. without reduction to the tridiagonal form (GPD-ZMK) [48]. All
The complexity of this approximation is methods have an approximation order of two. The nongeodesic
flops. The drawback of Padé method when used together with methods include the optimal projection (OP) and its approxima-
the scaling and squaring procedure is that if the norm of the tion (AOP).
argument is large the computational efficiency decreases due to
the repeated squaring. C. The Cost of Using an Adaptive Step Size
2) Approximation of the Matrix Exponential via Generalized In this subsection, we analyze the computational complexity
Polar Decomposition (GPD): The GPD method, recently pro- of adapting the step size with Armijo rule. The total computa-
posed in [30] is consistent with the Lie group structure as it maps tional cost is given by the complexity of computing the local
the elements of the Lie algebra exactly into the corresponding parametrization and the additional complexity of selecting the
Lie group. The method lends itself to implementation in parallel step size. Therefore, the step size adaptation is a critical aspect
architectures and it requires about flops [30] regardless of to be considered. We consider again the geodesic SD algorithms
the approximation order. It may not be the most efficient imple- and the nongeodesic SD algorithms, respectively. We show that
mentation in terms of flop count, but the algorithm has potential the geodesic methods may reduce the complexity of the step size
for highly parallel implementation. GPD algorithms based on adaptation.
splitting techniques have also been proposed in [48]. The cor- 1) The Geodesic SD Algorithms: Since the step size evolves
responding approximation is less complex than the one in [30] in a dyadic basis, the geodesic methods are very suitable for the
for the second and the third order. The second-order approxima- Armijo step. This is due to the fact that doubling the step size
tion requires only flops. This is the same amount of does not require any expensive computation, just squaring the
computation needed to perform the CT. Other efficient approx- rotation matrix as in the scaling and squaring procedure. For
imations in a Lie-algebraic setting have been considered in [49] normal matrices, the computation of the matrix exponential via
by using the technique of coordinates of the second kind (CSK). matrix squaring prevents the round-off error accumulation [32].
A second-order CSK approximant requires flops. An Armijo type of geodesic SD algorithm enjoys this benefit,
since the argument of the matrix exponential is skew-Hermitian.
Moreover, when the step size is halved, the corresponding rota-
B. Nongeodesic SD Algorithms on tion matrix may be available from the scaling and squaring pro-
cedure which is often combined with other methods for approx-
imating the matrix exponential. This allows reducing the com-
This category includes local parametrizations derived from a
plexity because the expensive operation may often be avoided.
projection operator which is used to map arbitrary matrices into
2) The Nongeodesic SD Algorithms: The nongeodesic
. The optimal projection and an approximation of it are
methods compute the update by projecting a matrix into .
considered.
Unfortunately, the Armijo step size adaptation is relatively
1) Optimal Projection: The projection that minimizes the
expensive in this case. The main reason is that the update
Euclidean distance between the arbitrary matrix and a matrix
and the one corresponding to the
may be computed in different ways. By using the
double step size do not have a
SVD the computation of the projection requires flops and
straightforward relationship as squaring the rotation matrix for
by using the procedure (18) it requires about flops.
the geodesic methods. Thus, the projection operation needs to
2) Approximation of the Optimal Projection: This method
be computed multiple times. Moreover, even keeping the step
is the most inexpensive approximation of the optimal projec-
size constant involves the computation of the projection twice
tion, being based on the QR-decomposition of the matrix .
It requires only the unitary matrix which is an orthonormal 2Only dominant terms are reported, i.e., O (n ) .
Authorized licensed use limited to: Indian Institute of Technology Indore. Downloaded on May 05,2023 at 14:46:32 UTC from IEEE Xplore. Restrictions apply.
1142 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO. 3, MARCH 2008

TABLE IV We provide a closed-form expression for the expected value


THE COMPLEXITY OF ADAPTING THE STEP SIZE of the deviation form the unitary constraint after a certain
number of iterations. The theoretical value derived here pre-
dicts the error accumulation with high accuracy, as it will be
shown in the simulations. We show that the error accumulation
is negligible in practice.
We assume that at each iteration , the rotation matrix
is affected additively by the quantization error , i.e.,
since both inequalities 6 and 7 in Table II need to be tested even where is the true rotation
if they fail. In this case both projections and matrix. The real and imaginary parts of the entries of the matrix
need to be evaluated. are mutually independent and independent of the entry in-
We compare the proposed geodesic SD algorithms to the dices. They are assumed to be uniformly distributed within the
nongeodesic SD algorithms by considering the complexity of quantization interval of width . The deviation of the quantized
adapting to the step size. We also take into account the cost update from the unitary constraint is measured by
of computing the rotation matrix for the geodesic SD algorithms
and the cost of computing the projection operation for the (22)
nongeodesic algorithms. These costs are given in Table III for
The closed-form expression of the expected value of the devia-
different local parametrizations. We denote by and the
tion at iteration is given by (derivation available on request)
number of times we double, respectively, the number of times
we halve the step size during one iteration . The complexity
of adapting the step size is summarized in Table IV.
We may conclude that the local parametrization may be
chosen based on the requirements of the application. Often, (23)
preserving the algebraic structure is important. On the other
hand, the implementation complexity may be a limiting factor. The theoretical value (23) depends on the matrix dimension
Most of the parametrizations presented here are equivalent up and the width of the quantization interval . Often, the conver-
to the second order. Therefore, the difference in convergence gence is reached in just few iterations, as in the practical ex-
speed is not expected to be significant. The cost function to ample presented in Section VII. Therefore, the error accumula-
be minimized plays a role in this difference as also stated in tion problem is avoided. We show that even if the convergence is
[2]. An SD algorithm with adaptive step size is more suitable achieved after a large number of iterations, the expected value
in practice. Consequently, the geodesic algorithms are a good of the deviation from the unitary constraint is negligible. This
choice. In this case, the matrix exponential is employed and it is due to the fact that the dominant term in (23) is driven by
may be computed either by using the CT [31] or the GPD-ZMK the factor . The error is increasing very slowly and the
method [48]. They require equal number of flops, therefore increasing rate decays rapidly with , as it will be shown in Sec-
the choice remains upon the numerical stability. Even though tion VII.
the GPD-IZ method recently proposed in [30] is sensibly less
efficient in terms of flop count, it may be faster in practice if
VII. SIMULATION RESULTS AND APPLICATIONS
implemented in parallel architectures. Moreover, it provides
good numerical stability as it will be seen in the simulations. In this section, we test how the proposed method performs
As a final conclusion, we would opt for the GPD-IZ method in signal processing applications. An example of separating in-
[30] if the algorithm is implemented in a parallel fashion and dependent signals in a MIMO system is given. Applications to
for the CT if the parallel computation is not an option. array signal processing, ICA, BSS, for MIMO systems may be
found in [5]–[8], [10], [14]–[17], [19]–[23], [51]. A recent re-
VI. NUMERICAL STABILITY view of the applications of differential geometry to signal pro-
In this section, we focus on the numerical stability of the pro- cessing may be found in [52].
posed SD algorithms on . Taking into account the recur-
A. Blind Source Separation for MIMO Systems
sive nature of the algorithms, we analyze the deviation of each
new update from the unitary constraint, i.e., the depar- Separating signals blindly in a MIMO communication sys-
ture from . The nongeodesic SD algorithms do not experi- tems may be done by exploiting the statistical information of
ence this problem due to nature of local parametrization. In that the transmitted signals. The JADE algorithm [3] is a reliable
case, the error does not accumulate because the projection op- alternative for solving this problem. The JADE algorithm con-
erator maps the update into the manifold at every new iteration. sists of two stages. First, a prewhitening of the received signal is
Therefore, we consider only the geodesic SD algorithms. performed. The second stage is a unitary rotation. This second
The methods proposed here for approximating the matrix ex- stage is formulated as an optimization problem under unitary
ponential map the elements of the Lie algebra exactly into the matrix constraint, since no closed form solution can be given ex-
Lie group, therefore they do not cause deviation from the uni- cept for simple cases such as 2-by-2 unitary matrices. This may
tary constraint. However, the rotation matrix may be affected be efficiently solved by using the proposed SD on the unitary
by round-off errors, and the error may accumulate in the update group. It should be noted that the first stage can also be formu-
(14) due to the repeated matrix multiplications. lated as a unitary optimization problem [50], and the algorithms

Authorized licensed use limited to: Indian Institute of Technology Indore. Downloaded on May 05,2023 at 14:46:32 UTC from IEEE Xplore. Restrictions apply.
ABRUDAN et al.: SD ALGORITHMS FOR OPTIMIZATION 1143

proposed in this paper could be used to solve it. However, here


we only focus on the second stage.
The JADE approach has been recently considered on the
oblique [53] and Stiefel [19] manifolds. The SD algorithm
in [19] has complexity of per iteration as the original
JADE [3], but in general it converges in fewer iterations. This is
true especially for large matrix dimensions, where JADE seems
to converge slowly due to its pairwise processing approach.
Therefore, the overall complexity of algorithm in [19] is lower
than in the original JADE. It operates on the Stiefel manifold of
unitary matrices, but still without taking into account the
additional Lie group structure of the manifold. Our proposed
algorithm is designed specifically for the case of unitary Fig. 4. The constellation patterns corresponding to (a) four of the six received
matrices, and for this reason the complexity per iteration is signals and (b) the four recovered signals by using JADE with the algorithm
proposed in Table I. There is an inherent phase ambiguity which may be noticed
lower compared to the SD in [19]. The convergence speed is as a rotation of the constellation, as well as a permutation ambiguity.
identical as it will be shown later. The algorithms in [2] and
[19] are more general than the proposed one, in the sense that
the parametrization is chosen for the Stiefel and the Grassmann
manifolds. The reduction in complexity for the proposed algo- how well the eigenmatrices are jointly diagonalized. This
rithm is achieved by exploiting the additional group structure characterizes the goodness of the optimization solution, i.e., the
of . The SD along geodesics is more suitable for Armijo unitary rotation stage of the BSS. The Amari distance is a
step size. good performance measure for the entire blind source separa-
A number of independent zero-mean signals are sent by tion problem since it is invariant to permutation and scaling. In
using transmit antennas and they are received by receive terms of deviation from the unitary constraint the performance
antennas. The frequency flat MIMO channel matrix is in is measured by using a unitarity criterion (22), in a logarithmic
this case an mixing matrix . We use the clas- scale.
sical signal model used in source separation. The ma- A number of signals are transmitted, three QPSK sig-
trix corresponding to the received signal may be written as nals and one BPSK signal. The signal-to-noise ratio (SNR) is
where is an matrix corresponds to dB and the channel taps are independent random
the transmitted signals and is the additive white noise. coefficients with power distributed according to a Rayleigh dis-
In the prewhitening stage the received signal is decorrelated tribution. The results are averaged over 100 random realizations
based on the eigendecomposition of the correlation matrix. The of the (4 6) MIMO matrix and (4 1000) signal matrix.
In the first simulation, we compare three optimization algo-
prewhitened received signal is given by ,
rithms: the classical Euclidean SD algorithm which enforces
where and contain the eigenvectors and the eigen-
the unitarity of after every iteration, the Euclidean SD
values corresponding to the signal subspace, respectively.
with extra-penalty similar to [16] stemming from the Lagrange
In the second stage, the goal is to determine a unitary matrix
multipliers method and the proposed Riemannian SD algorithm
such that the estimated signals are the transmitted
from Table II. The update rule for the classical SD algorithm
signals up to a phase and a permutation ambiguity, which are in-
is . The unitarity property is enforced
herent to any blind methods. The unitary matrix may be obtained
by symmetric orthogonalization after every iteration [12], i.e.,
by exploiting the information provided by the fourth-order cu-
. The extra-penalty SD
mulants of the whitened signals. The JADE algorithm mini-
method uses an additional term
mizes the following criterion:
added to the original cost function (24) similarly to the bi-
gradient method in [16]. The corresponding update rule is
(24) . A
weighting factor is used to weight the importance of
the unitarity constraint. The third method is the SD algorithm
with respect to , under the unitarity constraint on , i.e., we summarized in Table II. Armijo [1] step size selection rule is
deal with a minimization problem on . The eigenmatrices used for all four methods. Fig. 4 shows received signal mix-
which are estimated from the fourth-order cumulants need tures, and separated signals by using JADE with the proposed
to be diagonalized. The operator computes the sum of the algorithm. The performance of the three algorithms in terms
squared magnitudes of the off-diagonal elements of a matrix, of convergence speed and accuracy of satisfying the unitary
therefore, the criterion penalizes the departure of all eigenma- constraint are presented in Fig. 5. The JADE criterion (24)
trices from the diagonal property. The Euclidean gradient of the versus the number of iterations is shown in subplot a) of Fig. 5.
JADE cost function is Subplot b) of Fig. 5 shows the evolution of the Amari distance
, where denotes the elementwise matrix multi- with the number of iterations. We may notice that the accuracy
plication. of the optimization solution described by the value of the JADE
The performance is studied in terms of convergence speed cost function is very related to the accuracy of solving the entire
considering the JADE criterion and the Amari distance (perfor- source separation, i.e., the Amari distance. The Riemannian SD
mance index) [12]. This JADE criterion (24) is a measure of algorithm (Table I) converges faster compared to the classical

Authorized licensed use limited to: Indian Institute of Technology Indore. Downloaded on May 05,2023 at 14:46:32 UTC from IEEE Xplore. Restrictions apply.
1144 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO. 3, MARCH 2008

Fig. 5. A comparison between the conventional optimization methods oper-


ating on the Euclidean space (the classical SD algorithm with enforcing uni-
tarity, the extra-penalty SD method) and the Riemannian SD algorithm from
Table II. The horizontal thick dotted line in subplots (a) and (b) represents the

(W )
solution of the original JADE algorithm [3]. The performance measures are the

(W )
JADE criterion J (24), Amari distance d and the unitarity criterion
1 Fig. 6. The JADE criterion J (24), the Amari distance and the
1
(22) versus the iteration step. The Riemannian SD algorithm outperforms the
unitarity criterion (22) versus the iteration step. A comparison between
()
conventional methods.
different local parametrizations on U n : the geodesic algorithms (continuous
line) versus the nongeodesic algorithms (dashed line). For the geodesic
algorithms the exponential map is computed by using three different methods:
methods, i.e., the Euclidean SD with enforcing unitarity and Matlab’s diagonal Padé approximation with scaling and squaring (DPA SS), +
the extra-penalty method. The Euclidean SD and extra-penalty GPD-IZ [30], CT. For the nongeodesic algorithms the projection operation is
computed with two different methods: the OP via Matlab’s svd function and
method do not operate in an appropriate parameter space, and the approximation of it (AOP) via QR decomposition. The horizontal thick
the convergence speed is decreased. All SD algorithms satisfy dotted line in subplots (a) and (b) represents the solution of the original JADE
perfectly the unitary constraint, except for the extra-penalty algorithm [3].
method which achieves also the lowest convergence speed. This
is due to the fact that an optimum scalar weighting parameter
may not exist. The unitary matrix constraint is equivalent to (GPD-IZ) [30] and the CT. The nongeodesic SD algorithms are
smooth real Lagrangian constrains, therefore more parameters based on a projection type of local parametrization. The OP is
could be used. However, computing these parameters may computed via SVD [2] by using the Matlab svd function and
computationally expensive and/or non-trivial even in the case the approximation of the optimal projection (AOP) based on
of , like the example presented in Section II-B. The the QR algorithm is computed via modified Gram-Schmidt pro-
accuracy of the solution in terms of unitary constraint is shown cedure [54]. In terms of convergence speed, both the geodesic
in subplot c) of Fig. 5 considering the criterion (22) versus and the nongeodesic SD algorithms such as [2] and [19] have
the number of iterations. similar performance, regardless of the local parametrization,
We will next analyze how the choice of the local parametriza- as shown in subplots a) and b) of Fig. 6. Also in terms of
tion affects the performance of the SD algorithms. The results unitarity criterion, all algorithms provide good performance.
are compared in terms of convergence speed and the accuracy of The solution of the original JADE algorithm (represented by
satisfying the unitary constraint. Two classes of Riemannian SD the horizontal thick dotted line in Fig. 6) is achieved in less
algorithms are considered. The first one includes the geodesic than 10 iteration for all SD algorithms, regardless of the local
SD algorithms and the second one include the nongeodesic SD parametrization.
algorithms. For the geodesic SD algorithms the exponential In conclusion, the choice of the local parametrization in
map is computed by three different methods: the Matlab’s made according to the computational complexity and numer-
expm function which uses the diagonal Padé approximation ical stability. An Armijo step size rule is very suitable to the
with scaling and squaring [32], the General- geodesic algorithms, i.e., using the exponential map as a local
ized Polar Decomposition of order four by Iselres and Zanna parametrization. If implemented in parallel architecture the

Authorized licensed use limited to: Indian Institute of Technology Indore. Downloaded on May 05,2023 at 14:46:32 UTC from IEEE Xplore. Restrictions apply.
ABRUDAN et al.: SD ALGORITHMS FOR OPTIMIZATION 1145

Expression for Riemannian gradient needed in the optimization


has been derived. The proposed algorithms move towards the
optimum along geodesics and the local parametrization is the
exponential map. We exploit the recent developments in com-
puting the matrix exponential needed in the multiplicative up-
date on . This operation may be efficiently computed in a
parallel fashion by using the GPD-IZ method [30] or in a serial
fashion by using the CT. We also address the numerical issues
and show that the geodesic algorithms together with the Armijo
rule [1] are more efficient in practical implementations. Non-
geodesic algorithms have been considered as well, and equiva-
lence up to a certain approximation order has been established.
The proposed geodesic algorithms are suitable for practical
applications where a closed form solution does not exist, or to
refine estimates obtained by classical means. Such an example
is the joint diagonalization problem presented in the paper. We
have shown that the unitary matrix optimization problem en-
P~
Fig. 7. The deviation from the unitary constraint for different values of the
quantization errors  on the rotation matrices after one million iterations. countered in the JADE approach for blind source separation
The theoretical value E [1 ] in (23) is represented by continuous black line.
The unitarity criterion 1 (22) obtained by repeatedly multiplying unitary ma-
[3] may be efficiently solved by using the proposed algorithms.
trices is represented by dashed gray line. The value 1 obtained by using the Other possible applications include: smart antenna algorithms,
proposed SD algorithm in Table II is represented by dot-dashed thick black line. wireless communications, biomedical measurements and signal
The theoretical expected value of the deviation from the unitary constraint pre-
dicts accurately the value obtained in simulations. The proposed algorithm pro-
separation, where unitary matrices play an important role in
duces an error lower that the theoretical bound due to the fact that the error does general. The algorithms introduced in this paper provide sig-
not accumulate after the convergence has been reached. nificant advantages over classical Euclidean gradient with en-
forcing unitary constraint and Lagrangian type of methods in
terms convergence speed and accuracy of the solution. The uni-
GPD-IZ method [30] for computing the matrix exponential
tary constraint is automatically maintained at each iteration, and
is a reliable choice from the point of view of efficiency and
computational speed. Otherwise, the CT provides a reasonable consequently, undesired suboptimal solutions may be avoided.
performance at a low computational cost. Finally, the proposed Moreover, for the specific case of , the proposed algorithm
algorithm has lower computational complexity per iteration has lower computational complexity than the nongeodesic SD
compared to the nongeodesic SD in [2], [19] at the same algorithms in [2].
convergence speed. This reduction is achieved by exploiting
the Lie group structure of the Stiefel manifold of unitary REFERENCES
matrices.
[1] E. Polak, Optimization: Algorithms and Consistent Approximations.
The last simulation shows how the round-off errors caused New York: Springer-Verlag, 1997.
by finite numerical precision affect the proposed iterative algo- [2] J. H. Manton, “Optimization algorithms exploiting unitary constraints,”
rithm. In Fig. 7, different values of the quantization error are IEEE Trans. Signal Process., vol. 50, pp. 635–650, Mar. 2002.
considered for the rotation matrices , which are obtained by [3] J. Cardoso and A. Souloumiac, “Blind beamforming for non Gaussian
signals,” Inst. Elect. Eng. Proc.-F, vol. 140, no. 6, pp. 362–370, 1993.
using the method [32]. Similar results are obtained [4] S. T. Smith, “Subspace tracking with full rank updates,” in Conf. Rec.
by using the other approximation methods of the matrix expo- 31st Asilomar Conf. Signals, Syst. Comp., Nov. 2–5, 1997, vol. 1, pp.
nential presented in Section V-A. The theoretical value of the 793–797.
unitarity criterion in (23) is represented by continuous [5] D. R. Fuhrmann, “A geometric approach to subspace tracking,” in
Conf. Rec. 31st Asilomar Conf. Signals, Syst., Comp., Nov. 2–5, 1997,
lines. The value (22) obtained by repeatedly multiplying
vol. 1, pp. 783–787.
unitary matrices is represented by dashed lines and the value [6] J. Yang and D. B. Williams, “MIMO transmission subspace tracking
obtained by using the proposed algorithm in Table II is repre- with low rate feedback,” in Int. Conf. Acoust., Speech Signal Process.,
sented by dot-dashed thick lines. The theoretical value (23) pre- Philadelphia, PA, Mar. 2005, vol. 3, pp. 405–408.
dicts accurately the value (22) obtained by repeated multiplica- [7] W. Utschick and C. Brunner, “Efficient tracking and feedback of
DL-eigenbeams in WCDMA,” in Proc. 4th Europ. Pers. Mobile
tions of unitary matrices. The proposed algorithm exhibits an Commun. Conf., Vienna, Austria, 2001.
error below the theoretical value due to the fact that the conver- [8] M. Wax and Y. Anu, “A new least squares approach to blind beam-
gence is reached in few steps. Even if the convergence would forming,” in IEEE Int. Conf. Acoust., Speech, Signal Process., Apr.
be reached after a much larger number of iterations, the error 21–24, 1997, vol. 5, pp. 3477–3480.
[9] P. Stoica and D. A. Linebarger, “Optimization result for constrained
accumulation is negligible for reasonable values of the quanti- beamformer design,” in IEEE Signal Process. Lett., Apr. 1995, vol. 2,
zation errors , as shown in Fig. 7. In practice, a much pp. 66–67.
smaller number of iterations need to be performed. [10] Y. Nishimori, “Learning algorithm for independent component anal-
ysis by geodesic flows on orthogonal group,” in Int. Joint Conf. Neural
VIII. CONCLUSION Netw., Jul. 10–16, 1999, vol. 2, pp. 933–938.
[11] Y. Nishimori and S. Akaho, “Learning algorithms utilizing
In this paper, Riemannian optimization algorithms on the Lie quasi-geodesic flows on the Stiefel manifold,” Neurocomputing,
group of unitary matrices have been introduced. vol. 67, pp. 106–135, Jun. 2005.

Authorized licensed use limited to: Indian Institute of Technology Indore. Downloaded on May 05,2023 at 14:46:32 UTC from IEEE Xplore. Restrictions apply.
1146 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO. 3, MARCH 2008

[12] A. Hyvärinen, J. Karhunen, and E. Oja, Independent Component Anal- [39] R. W. Brockett, “Least squares matching problems,” Linear Algebra
ysis. New York: Wiley, 2001. Applicat., vol. 122/123/124, pp. 761–777, 1989.
[13] S. Fiori, A. Uncini, and F. Piazza, “Application of the MEC network [40] R. W. Brockett, “Dynamical systems that sort lists, diagonalize ma-
to principal component analysis and source separation,” in Proc. Int. trices, and solve linear programming problems,” Linear Algebra Ap-
Conf. Artif. Neural Netw., 1997, pp. 571–576. plicat., vol. 146, pp. 79–91, 1991.
[14] M. D. Plumbley, “Geometrical methods for non-negative ICA: Man- [41] B. Owren and B. Welfert, “The Newton iteration on Lie groups,” BIT
ifolds, Lie groups, toral subalgebras,” Neurocomput., vol. 67, pp. Numer. Math., vol. 40, pp. 121–145, Mar. 2000.
161–197, 2005. [42] P.-A. Absil, R. Mahony, and R. Sepulchre, “Riemannian geometry of
[15] A. Cichocki and S.-I. Amari, Adaptive Blind Signal and Image Pro- Grassmann manifolds with a view on algorithmic computation,” Acta
cessing. New York: Wiley, 2002. Applicandae Mathematicae, vol. 80, no. 2, pp. 199–220, 2004.
[16] L. Wang, J. Karhunen, and E. Oja, “A bigradient optimization approach [43] S. G. Krantz, Function Theory of Several Complex Variables, 2nd ed.
for robust PCA, MCA and source separation,” in Proc. IEEE Conf. Pacific Grove, CA: Wadsworth and Brooks/Cole Advanced Books and
Neural Netw., 27 Nov.–1 Dec. 1995, vol. 4, pp. 1684–1689. Software, 1992.
[17] S. C. Douglas, “Self-stabilized gradient algorithms for blind source [44] S. Smith, “Statistical resolution limits and the complexified
separation with orthogonality constraints,” IEEE Trans. Neural Netw., Cramér-Rao bound,” IEEE Trans. Signal Process., vol. 53, pp.
vol. 11, pp. 1490–1497, Nov. 2000. 1597–1609, May 2005.
[18] S.-I. Amari, “Natural gradient works efficiently in learning,” Neural [45] D. H. Brandwood, “A complex gradient operator and its applications
Comput., vol. 10, no. 2, pp. 251–276, 1998. in adaptive array theory,” in Inst. Elect. Eng. Proc., Parts F and H, Feb.
[19] M. Nikpour, J. H. Manton, and G. Hori, “Algorithms on the Stiefel 1983, vol. 130, pp. 11–16.
manifold for joint diagonalisation,” in Proc. IEEE Int. Conf. Acoust., [46] Y. Yang, “Optimization on Riemannian manifold,” in Proc. 38th Conf.
Speech Signal Process., 2002, vol. 2, pp. 1481–1484. Decision Contr., Phoenix, AZ, Dec. 1999, pp. 888–893.
[20] C. B. Papadias, “Globally convergent blind source separation based [47] N. J. Higham, “Matrix nearness problems and applications,” in Appli-
on a multiuser kurtosis maximization criterion,” IEEE Trans. Signal cations of Matrix Theory, M. J. C. Gover and S. Barnett, Eds. Oxford,
Process., vol. 48, pp. 3508–3519, Dec. 2000. U.K.: Oxford Univ. Press, 1989, pp. 1–27.
[21] P. Sansrimahachai, D. Ward, and A. Constantinides, “Multiple-input [48] A. Zanna and H. Z. Munthe-Kaas, “Generalized polar decomposition
multiple-output least-squares constant modulus algorithms,” in IEEE for the approximation of the matrix exponential,” SIAM J. Matrix Anal.,
Global Telecommun. Conf., Dec. 1–5, 2003, vol. 4, pp. 2084–2088. vol. 23, pp. 840–862, Jan. 2002.
[22] C. B. Papadias and A. M. Kuzminskiy, “Blind source separation with [49] E. Celledoni and A. Iserles, “Methods for approximation of a matrix
randomized Gram-Schmidt orthogonalization for short burst systems,” exponential in a Lie-algebraic setting,” IMA J. Numer. Anal., vol. 21,
in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., May 17–21, no. 2, pp. 463–488, 2001.
2004, vol. 5, pp. 809–812. [50] T. Abrudan, J. Eriksson, and V. Koivunen, “Optimization under unitary
[23] J. Lu, T. N. Davidson, and Z. Luo, “Blind separation of BPSK signals matrix constraint using approximate matrix exponential,” in Conf. Rec.
using Newton’s method on the Stiefel manifold,” in IEEE Int. Conf. 39th Asilomar Conf. Signals, Syst. Comp. 2005, 28 Oct.–1 Nov. 2005.
Acoust., Speech, Signal Process., Apr. 2003, vol. 4, pp. 301–304. [51] P. Sansrimahachai, D. Ward, and A. Constantinides, “Blind source sep-
[24] A. Edelman, T. Arias, and S. Smith, “The geometry of algorithms with aration for BLAST,” in Proc. 14th Int. Conf. Digital Signal Process.,
orthogonality constraints,” SIAM J. Matrix Analysis Applicat., vol. 20, Jul. 1–3, 2002, vol. 1, pp. 139–142.
no. 2, pp. 303–353, 1998. [52] J. H. Manton, “On the role of differential geometry in signal pro-
[25] S. Fiori, “Stiefel-Grassman Flow (SGF) learning: Further results,” in cessing,” in Int. Conf. Acoust., Speech Signal Process., Philadelphia,
Proc. IEEE-INNS-ENNS Int. Joint Conf. Neural Netw., Jul. 24–27, PA, Mar. 2005, vol. 5, pp. 1021–1024.
2000, vol. 3, pp. 343–348. [53] P. Absil and K. A. Gallivan, Joint diagonalization on the oblique man-
[26] J. H. Manton, R. Mahony, and Y. Hua, “The geometry of weighted ifold for independent component analysis DAMTP, Univ. Cambridge,
low-rank approximations,” IEEE Trans. Signal Process., vol. 51, pp. U.K., Tech. Rep. NA2006/01, 2006 [Online]. Available: http://www.
500–514, Feb. 2003. damtp.cam.ac.uk/user/na/reports.html
[27] S. Fiori, “Quasi-geodesic neural learning algorithms over the orthog- [54] G. H. Golub and C. van Loan, Matrix Computations, 3rd ed. Balti-
onal group: A tutorial,” J. Mach. Learn. Res., vol. 1, pp. 1–42, Apr. more, MD: The Johns Hopkins Univ. Press, 1996.
2005.
[28] D. G. Luenberger, “The gradient projection method along geodesics,”
Manage. Sci., vol. 18, pp. 620–631, 1972.
[29] D. Gabay, “Minimizing a differentiable function over a differential Traian E. Abrudan (S’02) received the M.Sc. de-
gree from the Technical University of Cluj-Napoca,
manifold,” J. Optim. Theory Applicat., vol. 37, pp. 177–219, Jun. 1982.
Romania, in 2000.
[30] A. Iserles and A. Zanna, “Efficient computation of the matrix exponen-
Since 2001, he has been with the Signal Pro-
tial by general polar decomposition,” SIAM J. Numer. Anal., vol. 42, cessing Laboratory, Helsinki University of Tech-
pp. 2218–2256, Mar. 2005. nology (HUT), Finland. He is a Ph.D. student with
[31] I. Yamada and T. Ezaki, “An orthogonal matrix optimization by dual the Electrical and Communications Engineering De-
Cayley parametrization technique,” in Proc. ICA, 2003, pp. 35–40. partment, HUT. Since 2005, he has been a member
[32] C. Moler and C. van Loan, “Nineteen dubious ways to compute the of GETA, Graduate School in Electronics, Telecom-
exponential of a matrix, twenty-five years later,” SIAM Rev., vol. 45, munications and Automation. His current research
no. 1, pp. 3–49, 2003. interests include statistical signal processing and
[33] S. Douglas and S.-Y. Kung, “An ordered-rotation kuicnet algorithm optimization algorithms for wireless communications with emphasis on MIMO
for separating arbitrarily-distributed sources,” in Proc. IEEE Int. Conf. and multicarrier systems.
Independ. Compon. Anal. Signal Separat., Aussois, France, Jan. 1999,
pp. 419–425.
[34] C. Udriste, Convex Functions and Optimization Methods on Rie-
mannian Manifolds. Mathematics and Its Applications. Boston, Jan Eriksson (M’04) received the M.Sc. degree in
MA: Kluwer Academic, 1994. mathematics from University of Turku, Finland, in
[35] M. P. do Carmo, Riemannian Geometry. Mathematics: Theory and Ap- 2000, and the D.Sc.(Tech) degree (with honors) in
plications. Boston, MA: Birkhauser, 1992. signal processing from Helsinki University of Tech-
[36] A. Knapp, Lie Groups Beyond an Introduction, Vol. 140 of Progress in nology (HUT), Finland, in 2004.
Mathematics. Boston, MA: Birkhauser, 1996. Since 2005, he has been working as a postdoc-
[37] D. G. Luenberger, Linear and Nonlinear Programming. Reading, toral researcher with the Academy of Finland. His
MA: Addison-Wesley, 1984. research interests are in blind signal processing, sto-
[38] S. T. Smith, “Optimization techniques on Riemannian manifolds,” chastic modeling, constrained optimization, digital
Fields Inst. Commun., Amer. Math. Soc., vol. 3, pp. 113–136, 1994. communication, and information theory.

Authorized licensed use limited to: Indian Institute of Technology Indore. Downloaded on May 05,2023 at 14:46:32 UTC from IEEE Xplore. Restrictions apply.
ABRUDAN et al.: SD ALGORITHMS FOR OPTIMIZATION 1147

Visa Koivunen (S’87–M’93–SM’98) received the munications Engineering nominated by the Academy of Finland. Since 2003,
D.Sc. (Tech) degree with honors from the Depart- he has also been an adjunct full professor with the University of Pennsylvania.
ment of Electrical Engineering, University of Oulu, During his sabbatical leave (2006–2007), he was the Nokia Visiting Fellow at
Finland. He received the primus doctor (best grad- the Nokia Research Center, as well as a Visiting Fellow at Princeton Univer-
uate) award among the doctoral graduates during sity, Princeton, NJ. His research interests include statistical, communications,
1989–1994. and sensor array signal processing. He has published more than 200 papers in
From 1992 to 1995, he was a visiting researcher international scientific conferences and journals.
at the University of Pennsylvania, Philadelphia. In Dr. Koivunen coauthored the papers receiving the Best Paper award in IEEE
1996, he held a faculty position with the Department PIMRC 2005, EUSIPCO 2006, and EuCAP 2006. He served as an Associate
of Electrical Engineering, University of Oulu. From Editor for IEEE SIGNAL PROCESSING LETTERS. He is a member of the editorial
August 1997 to August 1999, he was an Associate board for the Signal Processing journal and the Journal of Wireless Communi-
Professor with the Signal Processing Labroratory, Tampere University of Tech- cation and Networking. He is also a member of the IEEE Signal Processing for
nology, Finland. Since 1999 he has been a Professor of Signal Processing with Communication Technical Committee (SPCOM-TC). He was the general chair
the Department of Electrical and Communications Engineering, Helsinki Uni- of the IEEE SPAWC (Signal Processing Advances in Wireless Communication)
versity of Technology (HUT), Finland. He is one of the Principal Investigators in conference held in Helsinki, June 2007.
Smart and Novel Radios (SMARAD) Center of Excellence in Radio and Com-

Authorized licensed use limited to: Indian Institute of Technology Indore. Downloaded on May 05,2023 at 14:46:32 UTC from IEEE Xplore. Restrictions apply.

You might also like