Symmetry: Modified Jacobi-Gradient Iterative Method For Generalized Sylvester Matrix Equation

SS symmetry
Article
Modified Jacobi-Gradient Iterative Method for
Generalized Sylvester Matrix Equation
Nopparut Sasaki and Pattrawut Chansangiam *
Department of Mathematics, Faculty of Science, King Mongkut’s Institute of Technology Ladkrabang,
Bangkok 10520, Thailand; [email protected]
* Correspondence: [email protected]; Tel.: +66-935-266600

Received: 14 October 2020; Accepted: 2 November 2020; Published: 5 November 2020
Abstract: We propose a new iterative method for solving a generalized Sylvester matrix equation
A1 XA2 + A3 XA4 = E with given square matrices A1 , A2 , A3 , A4 and an unknown rectangular matrix
X. The method aims to construct a sequence of approximated solutions converging to the exact
solution, no matter the initial value is. We decompose the coefficient matrices to be the sum of
its diagonal part and others. The recursive formula for the iteration is derived from the gradients
of quadratic norm-error functions, together with the hierarchical identification principle. We find
equivalent conditions on a convergent factor, relied on eigenvalues of the associated iteration matrix,
so that the method is applicable as desired. The convergence rate and error estimation of the
method are governed by the spectral norm of the related iteration matrix. Furthermore, we illustrate
numerical examples of the proposed method to show its capability and efficacy, compared to recent
gradient-based iterative methods.
Keywords: generalized Sylvester matrix equation; iterative method; gradient; Kronecker product;
matrix norm
MSC: 65F45; 15A12; 15A60; 15A69
1. Introduction
In control engineering, certain problems concerning the analysis and design of control systems
can be formulated as the Sylvester matrix equation:
A1 X + XA2 = C (1)
where X ∈ Rm×n is an unknown matrix, and A1 , A2 , C are known matrices of appropriate dimensions.
Here, Rm×n stands for the set of m × n real matrices. Let us denote by (·) T the transpose of a matrix.
When A2 = A1 T , the equation is reduced to the Lyapunov equation, which is often found in continuous-
and discrete-time stability analysis [1,2]. The Sylvester equation is a special case of a generalized
Sylvester matrix equation:
A1 XA2 + A3 XA4 = E (2)
where X ∈ Rm×n is unknown, and A1 , A2 , A3 , A4 , E are known constant matrices of appropriate

dimensions. This equation also includes the equation A1 XA2 = C and the Kalman–Yakubovich equation
A1 XA2 − X = C as special cases. All of these equations have profound applications in linear system
theory and related areas.
Normally, a direct way to solve the generalized Sylvester Equation (2) is to reduce it to a linear
system by taking the vector operator. Then, Equation (2) reduces to Px = b where
Symmetry 2020, 12, 1831; doi:10.3390/sym12111831 www.mdpi.com/journal/symmetry

Symmetry 2020, 12, 1831 2 of 15
P = A2 T ⊗ A1 + A4 T ⊗ A3 , x = vec( X ), and b = vec( E).

Here, vec(·) is the vector operator and the operation ⊗ is the Kronecker multiplication. So, Equation (2)
has a unique solution if and only if the square matrix P is invertible. However, it is not easy to compute
P−1 when the sizes of A1 , A2 , A3 , A4 are not small, since the size of P can be very large. Such a size
problem leads to computation difficulty in that excessive computer memory is required for the inversion
of large matrices. Thus, another way exists which transform the matrix cofficients into a Schur or
Hessenberg form, for which solutions may be readily computed—see [3,4].
For matrix equations of large dimensions, iterative algorithms to find an approximated/exact
solution are remarkable. There are many techniques to construct an iterative procedure for Equation (2) or
its special cases—e.g., matrix sign function [5], block successive over-relaxation [6], block recursion [7,8],
Krylov subspace [9,10], and truncated low-rank algorithm [11]. Lately, there have been some variants of
Hermitian and skew-Hermitian splitting—e.g., a generalized modified Hermitian and skew-Hermitian
splitting algorithm [12], accelerated double-step scale splitting algorithm [13], PHSS algorithm [14],
and the four parameter PSS algorithm [15]. Furthermore, the idea of conjugate gradient leads to finite-step
iterative methods to find the exact solution such as the generalized conjugated direction algorithm [16],
the conjugated gradient least-squares algorithm [17], and generalized product-type methods based on the
Bi-conjugate gradient algorithm [18].
In the last decade, many authors have developed gradient-based iterative (GI) algorithms for
certain linear matrix equations that satisfy the asymptotic stability (AS) in the following meaning:
(AS): The sequence of approximated solutions converges to the exact solution, no matter the initial
value is.
The first GI algorithm for solving (1) was developed by Ding and Chen [19]. In that paper, a sufficient
condition in terms of a convergence factor is determined so that the algorithm satisfies (AS) property.
By introducing of a relaxation parameter, Niu et al. [20] suggested a relaxed gradient iterative (RGI)
gradient algorithm to overcome (1). Numerical studies show that when the relaxation factor is correctly
selected the convergent behavior of the Niu’s algorithm is stronger than the Ding’s algorithm. Zhang
and Sheng [21] introduced an RGI algorithm for finding the symmetric (skew-symmetric) solution of
Equation (1). Xie et al. [22] improved the RGI algorithm to become an accelerated GI (AGBI) algorithm,
on the basis of the information generated in the previous half-step and a relaxation factor. Ding and
Chen [23] also applied the ideas of gradients and least-squares to formulate the least-squares iterative
(LSI) algorithm. In [24], Fan et al. realized that multiplication of the matrix in GI would take great time
and space if A1 and A2 were big and dense, so they proposed the following Jacobi-gradient iterative
(JGI) method.
Method 1 (Jacobi-Gradient based Iterative (JGI) algorithm [24]). For i = 1, 2, let Di be the diagonal part
of Ai . Given any initial matrices X1 (0), X2 (0). Set k = 0 and compute X (0) = (1/2)( X1 (0) + X2 (0)).
For k = 1, 2, . . . , End, do:
X1 (k) = X (k − 1) + µD1 [C − A1 X (k − 1) − X (k − 1) A2 ],
X2 (k) = X (k − 1) + µ[C − A1 X (k − 1) − X (k − 1) A2 ] D2 ,
1
X (k) = ( X (k) + X2 (k)).
2 1
After that, Tian et al. [25] proposed that an accelerated Jacobi-gradient iterative (AJGI) algorithm
for solving the Sylvester matrix equation relies on two relaxation factors and the half-step update.
However, the parameter values to apply to the algorithm are difficult to find since they are given in
terms of a nonlinear inequality. For the generalized Sylvester Equation (2), the gradient iterative (GI)
algorithm [19] and the least-squares iterative (LSI) algorithm [26] were established as follows.
Symmetry 2020, 12, 1831 3 of 15
Method 2 (GI algorithm [19]). Given any two initial matrices X1 (0), X2 (0). Set k = 0 and compute
X (0) = (1/2)( X1 (0) + X2 (0)). For k = 1, 2, . . . , End, do:
X1 (k) = X (k − 1) + µA1T [ E − A1 X (k − 1) A2 − A3 X (k − 1) A4 ] A2T ,

X2 (k) = X (k − 1) + µA3T [ E − A1 X (k − 1) A2 − A3 X (k − 1) A4 ] A4T ,
1
X (k) = ( X (k) + X2 (k)).
2 1
A sufficient condition for which the algorithm satisfies (AS) is
2
0<µ< .
k A1 k22 k A2 k22 + k A3 k22 k A4 k22
Method 3 (LSI algorithm [26]). Given any two initial matrices X1 (0), X2 (0). Set k = 0 and compute X(0) =
1
2 (X1 (0) + X2 (0)). For k = 1, 2, . . . , End, do:
X1 (k) = X (k − 1) + µ( A1T A1 )−1 A1T [ E − A1 X (k − 1) A2 − A3 X (k − 1) A4 ] A2T ( A2 A2T )−1 ,

X2 (k) = X (k − 1) + µ( A3T A3 )−1 A3T [ E − A1 X (k − 1) A2 − A3 X (k − 1) A4 ] A4T ( A4 A4T )−1 ,
1
X (k) = ( X (k) + X2 (k)).
2 1
If 0 < µ < 4, then the algorithm satisfies (AS).
In this paper, we shall propose a new iterative method for solving the generalized Sylvester matrix
Equation (2), when A1 , A3 ∈ Rm×m , A2 , A4 ∈ Rn×n and X, E ∈ Rm×n . This algorithm requires only one
initial value X (0) and only one parameter, called a convergence factor. We decompose the coefficient
matrices to be the sum of its diagonal part and others. The recursive formula for iteration is derived
from the gradients of quadratic norm-error functions together with hierarchical identification principle.
Under assumptions on the real-parts sign of eigenvalues of matrix coefficients, we find necessary and
sufficient conditions on a convergent factor for which (AS) holds. The convergence rate and error
estimates are regulated by the iteration matrix spectral radius. In particular, when the iteration matrix
is symmetric, we obtain a convergence criteria, error estimates and the optimal convergent factor in
terms of spectral norms and condition number. Moreover, numerical simulations are also provided
to illustrate our results for (2) and (1). We compare the efficiency of our algorithm to LSI, GI, RGI,
AGBI and JGI algorithms.
Let us recall some terminology from matrix analysis—see e.g., [27]. For any square matrix X,
denote σ ( X ) its spectrum, ρ( X ) its spectral radius, and tr( X ) its trace. Let us denote the largest and
the smallest eigenvalues of a matrix by λmax (·) and λmin (·), respectively. Recall that the spectral norm
k · k2 and the Frobenius norm k · k F of A ∈ Rm×n are, respectively, defined by
q q
k A k2 = λmax ( A T A) and k Ak F = tr( A T A).
The condition number of A 6= 0 is defined by

s
λmax ( A T A)
κ ( A) = .
λmin ( A T A)
Denote the real part of a complex number z by <(z).

The rest of paper is organized as follows. We propose a modified Jacobi-gradient iterative algorithm
in Section 2. Convergence criteria, convergence rate, error estimates, and optimal convergence factor
are discussed in Section 3. In Section 4, we provide numerical simulations of the algorithm. Finally,
we conclude the paper in Section 5.
Symmetry 2020, 12, 1831 4 of 15
2. A Modified Jacobi-Gradient Iterative Method for the Generalized Sylvester Equation

In this section, we propose an iterative algorithm for solving the generalized Sylvester equation,
called a modified Jacobi-gradient iterative algorithm.
Throughout, let m, n ∈ N and A1 , A3 ∈ Rm×m , A2 , A4 ∈ Rn×n and E ∈ Rm×n . We would like to
find a matrix X ∈ Rm×n , such that
A1 XA2 + A3 XA4 = E. (3)
Write A1 = D1 + F1 , A2 = D2 + F2 , A3 = D3 + F3 and A4 = D4 + F4 , where D1 , D2 , D3 , D4 are the

diagonal parts of A1 , A2 , A3 , A4 , respectively. A necessary and sufficient condition for (3) to have a
unique solution is the invertibility of the square matrix
P := A2T ⊗ A1 + A4T ⊗ A3 .
In this case, the solution is given by vec X = P−1 vec E.

To obtain an iterative algorithm for solving (3), we recall the hierarchical identification principle
in [19]. We write (3) to
( D1 + F1 ) X ( D2 + F2 ) + A3 XA4 = E, (4)
A1 XA2 + ( D3 + F3 ) X ( D4 + F4 ) = E. (5)
Define two matrices

M := E − F1 XD2 − D1 XF2 − F1 XF2 − A3 XA4 ,
N := E − F3 XD4 − D3 XF4 − F3 XF4 − A1 XA2 .
From (4) and (5), we shall find the approximated solution of the following two subsystems
D1 XD2 = M and D3 XD4 = N (6)
so that the following norm-error functions are minimized:
L1 ( X ) := k D1 XD2 − Mk2F and L2 ( X ) := k D3 XD4 − N k2F . (7)
From the gradient formula
d
tr( AX ) = A T ,
dX
we can deduce the gradient of the error L1 as follows:
∂ ∂ h i
L1 ( X ) = tr ( D1 XD2 − M) T ( D1 XD2 − M )
∂X ∂X
∂ ∂ ∂
= tr( XD2 D2 X T D1 D1 ) − tr( XD2 M T D1 ) − tr( X T D1 MD2 )
∂X ∂X ∂X
= ( D1 D2 ) X T D2 D2 + D1 D1 XD2 D2 − D1 MD2 − ( D2 M T D1 )T
= 2D1 ( D1 XD2 − M) D2 . (8)
Similarly, we have
∂
L2 ( X ) = 2A3T ( A3 XA4 − N ) A4T . (9)
∂X
Symmetry 2020, 12, 1831 5 of 15
Let X1 (k) and X2 (k) be the estimates or iterative solutions of the system (6) at k-th iteration. The recursive
formulas of X1 (k) and X2 (k) come from the gradient formulas (8) and (9), as follows:
X1 (k ) = X (k − 1) + µD1 ( M − D1 X (k − 1) D2 ) D2
= X (k − 1) + µD1 ( E − A1 X (k − 1) A2 + A3 X (k − 1) A4 ) D2 ,
X2 (k ) = X (k − 1) + µD3 ( N − D3 X (k − 1) D3 ) D4
= X (k − 1) + µD3 ( E − A1 X (k − 1) A2 + A3 X (k − 1) A4 ) D4 .
Based on the hierarchical identification principle, the unknown variable X is replaced by its estimates
at the (k − 1)-th iteration. To avoid duplicated computation, we introduce a matrix
S ( k ) = E − ( A1 X ( k ) A2 + A3 X ( k ) A4 ),
so we have
1
X (k) = ( X (k) + X2 (k)) = X (k − 1) + µ( D1 S(k − 1) D2 + D3 S(k − 1) D4 ). (10)
2 1
Since any diagonal matrix is sparse, the operation counts in the computation (10) can be substantially
(l )
reduced. Let us denote S(k) = [sij (k)], X (k) = [ xij (k)], and Dl = [dij ] for each l = 1, 2, 3, 4. Indeed,
the multiplication of D1 S(k) D2 results in a matrix whose (ij)-th entry is the product of the i-th entry in
(1) (2)
the diagonal of D1 , the (ij)-th entry of S(k), and the j-th entry of D2 —i.e., D1 S(k) D2 = [dii sij (k)d jj ].
(3) (4)
Similarly, D3 S(k ) D4 = [dii sij (k)d jj ]. Thus,
(1) (2) (3) (4)

D1 S(k) D2 + D3 S(k) D4 = [(dii d jj + dii d jj )sij (k)].
The above discussion leads to the following Algorithm 1.
Algorithm 1: Modified Jacobi-gradient based iterative (MJGI) algorithm

A1 , A2 , A3 , A4 , E, X (0);
Choose µ ∈ R, e > 0 and set k = 1;
for k = 1, . . . , n do
S ( k − 1) = E − ( A1 X ( k − 1) A2 + A3 X ( k − 1) A4 );
(1) (2) (3) (4)
xij (k ) = xij (k − 1) + µ(dii d jj + dii d jj )sij (k − 1);
if kS(k − 1)k F < e then
break;
else
k = k + 1;
end
end
The complexity analysis for each step of the algorithm is given by 2mn(m + n + 5). When m = n,
the complexity analysis is 4n3 + 10n2 ∈ O(n3 ), so that the algorithm runtime complexity is cubic time.
The convergence property of the algorithm relies on the convergence factor µ. The appropriate value
of this parameter is determined in the next section.
3. Convergence Analysis of the Proposed Method

In this section, we make convergence analysis of Algorithm 1. First, we transform it into a linear iterative
method of the first order: x(k) = Tx(k − 1)where x(k) is a vector variable and T is a matrix. The iteration
matrix T will reflect convergence criteria, convergence rate, and error estimates of the algorithm.
Symmetry 2020, 12, 1831 6 of 15
3.1. Convergence Criteria

Theorem 1. Assume that the generalized Sylvester matrix Equation (3) has a unique solution X∗ . Denote H = D(P)P
and write σ ( H ) = {λ1 , . . . , λmn }. Let { X (k)} be a sequence generated from Algorithm 1.
(1) Then, (AS) holds if and only if ρ( Imn − µH ) < 1.

(2) If <(λ j ) > 0 for all j = 1, . . . , mn, then (AS) holds if and only if
2<(λ j )
0<µ< max .
j=1,...,mn | λ j |2
(3) If <(λ j ) < 0 for all j = 1, . . . , mn, then (AS) holds if and only if
2<(λ j )
min < µ < 0.
j=1,...,mn | λ j |2
(4) If H is symmetric, then (AS) holds if and only if λmax ( H ) and λmin ( H ) have the same sign, and µ is
chosen so that 
0 < µ < 2
λ (H)
if λmin ( H ) > 0,
max
2

λmin ( H )
<µ<0 if λmax ( H ) < 0.
Proof. From Algorithm 1, we start with considering the error matrices
X̃ (k) = X (k) − X ∗ , X̃1 (k) = X1 (k) − X ∗ and X̃2 (k ) = X2 (k) − X ∗ .
We will show that X̃ (k) → 0, or equivalently, vec X̃ (k ) → 0 as k → ∞. A direct computation reveals that
1
X̃ (k ) = ( X (k) + X2 (k))
2 1
1
= X̃ (k − 1) − µD1 ( A1 X̃ (k − 1) A2 + A3 X̃ (k − 1) A4 ) D2
2
1
− µD3 ( A1 X̃ (k − 1) A2 + A3 X̃ (k − 1) A4 ) D4 .
2
By taking the vector operator and using properties of the Kronecker product, we have
vec X̃ (k) = vec X̃ (k − 1)

− µ vec( D1 A1 X̃ (k − 1) A2 D2 + D1 A3 X̃ (k − 1) A4 D2 )
− µ vec( D3 A1 X̃ (k − 1) A2 D4 + D3 A3 X̃ (k − 1) A4 D4 )
= { Imn − µ[( D2 ⊗ D1 )( A2T ⊗ A1 )(+ D2 ⊗ D1 )( A4T ⊗ A3 )
+ ( D4 ⊗ D3 )( A2T ⊗ A1 ) + ( D4 ⊗ D3 )( A4T ⊗ A3 )]} vec( X̃ (k − 1))
h i
= Imn − µ( D2 ⊗ D1 + D4 ⊗ D3 )( A2T ⊗ A1 + A4T ⊗ A3 ) vec( X̃ (k − 1)).
Let us denote the diagonal part of P by D( P). Indeed,
D( P) = D2 ⊗ D1 + D4 ⊗ D3 .
Thus, we arrive at a linear iterative process
vec X̃ (k) = [ Imn − µH ] vec X̃ (k − 1), (11)
where H = D( P) P. Hence, the following statements are equivalent:

Symmetry 2020, 12, 1831 7 of 15
(i) vec X̃ (k) → 0 for any initial value vec X̃ (0).

(ii) System (11) has an asymptotically-stable zero solution.
(iii) The iteration matrix Imn − µH has spectral radius less than 1.
Indeed, since Imn − µH is a polynomial of H, we get
ρ( Imn − µH ) = max |1 − µλ|. (12)

λ∈σ( H )
Thus, ρ( Imn − µH ) < 1 if and only if |1 − µλ| < 1 for all λ ∈ σ( H ). Write λ j = a j + ib j where a j , b j ∈ R.
It follows that the condition |1 − µλ j | < 1 is equivalent to (1 − µλ j )(1 − µλ j ) < 1, or
µ(−2a j + µ( a2j + b2j )) < 0.
Thus, we arrive at two alternative conditions:
(i) µ > 0 and −2a j + µ( a2j + b2j ) < 0 for all j = 1, 2, 3, . . . , mn;
(ii) µ < 0 and −2a j + µ( a2j + b2j ) > 0 for all j = 1, 2, 3, . . . , mn.
Case 1 a j = <(λ j ) > 0 for all j. In this case, ρ( Imn − µH ) < 1 if and only if
2a j
0<µ< max . (13)
j=1,...,mn a2 + b2j
j
Case 2 a j = <(λ j ) < 0 for all j. In this case, ρ( Imn − µH ) < 1 if and only if
2a j
min < µ < 0. (14)
j=1,...,mn a2 + b2j
j
Now, suppose that H is a symmetric matrix. Then Imn − µH is also symmetric, and thus all its
eigenvalue are real. Hence,
ρ( Imn − µH ) = max {|1 − µλmin ( H )|, |1 − µλmax ( H )|} . (15)
It follows that ρ( Imn − µH ) < 1 if and only if
0 < µλmin ( H ) < 2 and 0 < µλmax ( H ) < 2. (16)
So, λmin ( H ) and λmax ( H ) cannot be zero.
Case 1 If λmax ( H ) ≥ λmin ( H ) > 0, then the condition (16) is equivalent to
2
0<µ< .
λmax ( H )
Case 2 If λmin ( H ) ≤ λmax ( H ) < 0, then the condition (16) is equivalent to
2
< µ < 0.
λmin ( H )
Case 3 If λmin ( H ) < 0 < λmax ( H ), then
2 2
<µ<0 and 0<µ< ,
λmin ( H ) λmax ( H )
which is a contradiction.
Symmetry 2020, 12, 1831 8 of 15
Therefore, the condition (16) holds if and only if λmax ( H ) and λmin ( H ) have the same sign and µ
is chosen according to the above condition.
3.2. Convergence Rate and Error Estimate

We now discuss the convergence rate and error estimates of Algorithm 1 from the iterative
process (11).
Suppose that Algorithm 1 satisfies the (AS) property—i.e., ρ( Imn − µH ) < 1. From (11), we have
k X (k) − X ∗ k F = k vec X̃ (k)k F = k( Imn − µH ) vec X̃ (k − 1)k F

≤ k Imn − µH k2 k X̃ (k − 1)k F = k Imn − µH k2 k X (k − 1) − X ∗ k F . (17)
It follows inductively that for each k ∈ N,
k X (k) − X ∗ k F ≤ k Imn − µH k2k k X (0) − X ∗ k F . (18)
Hence, the spectral norm of Imn − µH describes how fast the approximated solutions X (k) converges
to the exact solution X ∗ . The smaller spectral radius, the faster X (k) goes to X ∗ . In that case, since
k Imn − µH k2 < 1, if k X (k − 1) − X ∗ k F 6= 0 (i.e., X (k − 1) is not the exact solution) then
k X ( k ) − X ∗ k F < k X ( k − 1) − X ∗ k F . (19)
Thus, the error at each iteration gets smaller than the previous one.
The above discussion is summarized in the following theorem.
Theorem 2. Suppose that the parameter µ is chosen as in Theorem 1 so that Algorithm 1 satisfies (AS). Then,
the convergence rate of the algorithm is governed by the spectral radius (16). Moreover, the error estimate
k X (k) − X ∗ k F compared to the previous step and the fast step are provided by (17) and (18), respectively.
In particular, the error at each iteration gets smaller than the (nonzero ) previous one, as in (19).
From (16), if the eigenvalues of µH are close to 1, then the spectral radius of the iteration matrix is
close to 0, and hence, the error vec X̃ (k) or X̃ (k) converge faster to 0.
Remark 1. The convergence criteria and the convergence rate of Algorithm 1 depend on A, B, C and D but not
on E. However, the matrix E can be used for the stopping criteria.
The next proposition determines the iteration number for which the approximated solution X (k )
is close to the exact solution X ∗ so that k X (k ) − X ∗ k F < e.
Proposition 1. According to Algorithm 1, for each given error e > 0, we have k X (k ) − X ∗ k F < e after k∗
iterations for any k∗ , such that
log e − log k X (k ) − X ∗ k F
k∗ > . (20)
log k Imn − µH k2
Proof. From the estimation (18), we have
k X (k) − X ∗ k F ≤ k Imn − µH k2k k X (0) − X ∗ k F → 0 as k → ∞.
This means precisely that for each given e > 0, there is a k∗ ∈ N such that for all k ≥ k∗ ,
k Imn − µH k2k k X (0) − X ∗ k F < e.

Symmetry 2020, 12, 1831 9 of 15
Taking logarithms, we have that the above condition is equivalent to (20). Thus, if we run Algorithm 1 k∗
times, then we get kX (k) − X ∗ k < e as desired.
3.3. Optimal Parameter

We discuss the fastest convergence factor for Algorithm 1.
Theorem 3. The optimal convergence factor µ for which Algorithm 1 satisfies (AS) is one that minimizes
k Imn − µH k2 . If, in addition, H is symmetric, then the optimal convergence factor for which the algorithm
satisfies (AS) is determined by
2
µopt = . (21)
λmin ( H ) + λmax ( H )
In this case, the convergence rate is governed by
λmax ( H ) − λmin ( H ) κ2 − 1
ρ( Imn − µH ) = = 2 , (22)
λmax ( H ) + λmin ( H ) κ +1
where κ denotes be condition number of H, and we have the following estimates:
κ2 − 1
k X (k) − X ∗ k F ≤ k X ( k − 1) − X ∗ k F , (23)
κ2 + 1
κ2 − 1 k
k X (k) − X ∗ k F ≤ ( 2 ) k X (0) − X ∗ k F . (24)
κ +1
Proof. From Theorem 2, it is clear that the fastest convergence factor is attained at a convergence factor
that minimizes k Imn − µH k2 . Now, assume that H is symmetric. Then, Imn − µH is also symmetric,
thus all its eigenvalues are real and
k Imn − µH k2 = ρ( Imn − µH ). (25)
For convenience, denote a = λmin ( H ), b = λmax ( H ), and
f (µ) := ρ( Imn − µH ) = max {|1 − µa|, |1 − µb|} .
First, we consider the case λmin ( H ) > 0. To obtain the fastest convergence factor, according to (15),
we must solve the following optimization problem
min k Imn − µH k2 = min f (µ).

0< µ < λ 2 0<µ< 2b
max ( H )
We obtain that the minimizer is given by µopt = 2/( a + b), so that f (µopt ) = (b − a)/(b + a). For the
case λmax ( H ) < 0, we solve the following optimization problem
min k Imn − µH k2 = min f (µ).

2 < µ <0 2 < µ <0
λmin ( H ) a
A similar argument yields the same minimizer (21) and the same convergence rate (22). From (17),
(18) and (25), we obtain the bounds (23) and (24).
4. Numerical Simulations
In this section, we report numerical results to illustrate the effectiveness of Algorithm 1. We consider
various sizes of matrix systems, namely, small (2 × 2), medium (10 × 10) and large (100 × 100). For the
generalized Sylvester equation, we compare the performance of Algorithm 1 to the GI and LSI algorithms.
Symmetry 2020, 12, 1831 10 of 15
For the Sylvester equation, we compare our algorithm with GI, RGI, AGBI and JGI algorithms. All iterations
have been carried out the same environment: MATLAB R2017b, Intel(R) Core(TM) i7-7660U CPU @ 2.5GHz,
RAM 8.00 GB Bus speed 2133 MHz. We abbreviate IT and CPU for iteration time and CPU time (in seconds),
respectively. As step k-th of the iteration, we consider the following error:
δ ( k ) : = k E − A1 X ( k ) A2 − A3 X ( k ) A4 k F
where X (k) is the k-th approximated solution of the corresponding system.
4.1. Numerical Simulation for the Generalized Sylvester Matrix Equation

Example 1. Consider the matrix equation A1 XA2 + A3 XA4 = E where
" # " # " #
0.6959 −0.6385 −0.0688 −0.5309 0.4076 0.7184
A1 = , A2 = , A3 = ,
0.6999 0.0336 0.3196 0.6544 −0.8200 0.9686
" # " #
0.5313 0.1056 0.7788 0.0908
A4 = , E= .
0.3251 0.6110 0.4235 0.2665
Then, the exact solution of X is

" #
∗ 1.3036 −0.0532
X = .
1.2725 1.2284
Choose X (0) = zeros(2). In this case, all eigenvalues of H have positive real parts. The effect of changing the
convergence factor µ is illustrated in Figure 1. According to Theorem 1, the criteria for the convergence of X (k )
is that µ ∈ (0, 4.1870). Since µ1 , µ2 , µ3 , µ4 satisfy this criteria, the error is becoming smaller and goes to zero as
k increase, as in Figure 1. Among them, µ4 = 4.0870 gives the fastest convergence. For µ5 and µ6 , which do not
meet the criteria, the error δ(k ) does not converge to zero.
Figure 1. Error of Example 1.
Example 2. Suppose that A1 XA2 + A3 XA4 = E, where A1 , A2 , A3 , A4 and E are 10 × 10 matrices where
Symmetry 2020, 12, 1831 11 of 15
A1 = tridiag(1, 3, −1), A2 = tridiag(1, 1, −2), A3 = tridiag(−2, −2, 3),

A4 = tridiag(−3, 2, −1) and E = heptadiag(1, −2, 1, −2, −2, 1, −3).
Here, E is a heptadiagonal matrix—i.e., a band matrix with bandwidth 3. Choose an initial matrix X(0) = zeros(10),
where zeros(n) is an n-by-n matrix that contains 0 for every position. We compare Algorithm 1 with the direct
method, LSI and GI algorithms. Table 1 shows the errors at the final step of iteration as well as the computation
time after 75 iterations. Figure 2 illustrates that the approximated solutions via LSI diverge, while those via GI
and MJGI converge. Table 1 and Figure 2 imply that our algorithm takes significantly less computational time
and error than others.
Table 1. Computational time and error for Example 2.
Method IT CT Error: δ(75)

Direct - 0.0364 -
LSI 75 0.0125 1.1296 × 105
GI 75 0.0049 1.4185
MJGI 75 0.0022 0.5251
Figure 2. Error of Example 2.
Example 3. We consider the equation A1 XA2 + A3 XA4 = E in which A1 , A2 , A3 , A4 and E are 100 × 100
matrices determined by
A1 = tridiag(1, 1, −1), A2 = tridiag(1, 2, −2), A3 = tridiag(−1, −2, 3),

A4 = tridiag(−2, 1, −1) and E = heptadiag(1, 2, −4, 1, −2, 2, −3).
The initial matrix is given by X (0) = zeros(100). We run LSI, GI and MJGI algorithms by using
−1 −1
µ = 0.1, µ = (k A1 k2 k A2 k2 + k A3 k2 k A4 k2 ) , µ = 2(k A1 k2 k A2 k2 + k A3 k2 k A4 k2 ) ,
respectively. The reported result in Table 2 and Figure 3 illustrate that the approximated solution generated from
LSI diverges, while those from GI or MJGI converge. Both computational time and the error δ(100) from MJGI
are less than those from GI.
Symmetry 2020, 12, 1831 12 of 15
Table 2. Computational time and error for Example 3.

Direct - 34.6026 -
LSI 100 0.1920 2.7572 × 104
GI 100 0.0849 4.7395
MJGI 100 0.0298 1.8844
Figure 3. Comparison of Example 3.
4.2. Numerical Simulation for Sylvester Matrix Equation

Assume that the Sylvester equation
A1 X + XA4 = E (26)
has a unique solution. This condition is equivalent to that the Kronecker sum A4T ⊕ A1 is invertible,
or all possible sums between eigenvalues of A1 and A4 are nonzero. To solve (26), the Algorithm 2
is proposed:
T ( k ) : = E − A1 X ( k ) − X ( k ) A4 .
Algorithm 2: Modified Jacobi-gradient based iterative (MJGI) algorithm for Sylvester equation
A1 , A4 , E, X (0);
Choose µ ∈ R, e > 0 and set k = 1;
for k = 1, . . . , n do
T ( k ) = E − A1 X ( k ) − X ( k ) A4 ;
(1) (4)
xij (k ) = xij (k − 1) + µ(dii + d jj )tij (k − 1) where T (k ) = [tij (k)];
if k T (k − 1)k F < e then
break;
else
k = k + 1;
end
end
Symmetry 2020, 12, 1831 13 of 15
Example 4. Consider the equation A1 X + XA4 = E, in which E is the same matrix as in the previous example,
A1 = tridiag(2, −1, 1) ∈ R10×10 and A4 = tridiag(−1, 1, −2) ∈ R10×10 .
In this case, all eigenvalues of the iteration matrix have positive real parts, so that we can apply our algorithm.
We compare our algorithm with GI, RGI, AGBI and JGI algorithms. The results after running 100 iterations are
shown in Figure 4 and Table 3. According to the error and CT in Table 3 and Figure 4, our algorithm uses less
computational time and has smaller errors than others.
Table 3. CTs and errors for Example 4.

Direct - 0.0118 -
GI 100 0.0051 2.5981
RGI 100 0.0061 3.4741
AGBI 100 0.0051 7.3306
JGI 100 0.0038 17.2652
MJGI 100 0.0028 0.4281
Figure 4. Errors of Example 4.
5. Conclusions and Suggestion

A modified Jacobi-gradient (MJGI) algorithm (Algorithm 1) is proposed for solving the generalized
Sylvester matrix Equation (3). In order to have MJGI algorithm applicable for any sizes of matrix
system and any initial matrices, the convergence factor µ must be chosen properly according to
Theorem 1. In this case, the iteration matrix Imn − µH has a spectral radius less than 1. When the
iteration matrix is symmetric, we determine the optimal convergent factor µopt which enhances the
algorithm reaching the fastest rate of convergence. The asymptotic convergence rate of the algorithm
is governed by the spectral radius of Imn − µH. So, if the eigenvalue H is close to 1, then the algorithm
converges faster in the long run. The numerical examples reveal that our algorithm is suitable for small
(2 × 2), medium (10 × 10) and large (100 × 100) sizes of matrix systems. In addition, the MJGI algorithm
performs well compared to recent gradient iterative algorithms. For future works, we may add another
Symmetry 2020, 12, 1831 14 of 15
parameter for an updating step to make the algorithm converge faster—see [25]. Another possible way
is to apply the idea in this paper to derive an iterative algorithm for nonlinear matrix equations.
Author Contributions: Supervision, P.C.; software, N.S.; writing—original draft preparation, N.S.; writing—review
and editing, P.C. All authors contributed equally and significantly in writing this article. All authors have read
and agreed to the published version of the manuscript.
Funding: This research received no external funding.
Acknowledgments: The first author received financial support from the RA-TA graduate scholarship from the
faculty of Science, King Mongkut’s Institute of Technology Ladkrabang, Grant. No. RA/TA-2562-M-001 during
his Master’s study.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Shang, Y. Consensus seeking over Markovian switching networks with time-varying delays and uncertain
topologies. Appl. Math. Comput. 2016, 273, 1234–1245. [CrossRef]
2. Shang, Y. Average consensus in multi-agent systems with uncertain topologies and multiple time-varying
delays. Linear Algebra Appl. 2014, 459, 411–429. [CrossRef]
3. Golub, G.H.; Nash, S.; Van Loan, C.F. A Hessenberg-Schur method for the matrix AX + XB = C. IEEE Trans.
Automat. Control. 1979, 24, 909–913. [CrossRef]
4. Ding, F.; Chen, T. Hierarchical least squares identification methods for multivariable systems. IEEE Trans.
Automat. Control 1997, 42, 408–411. [CrossRef]
5. Benner, P.; Quintana-Orti, E.S. Solving stable generalized Lyapunov equations with the matrix sign function.
Numer. Algorithms 1999, 20, 75–100. [CrossRef]
6. Starke, G.; Niethammer, W. SOR for AX − XB = C. Linear Algebra Appl. 1991, 154–156, 355–375. [CrossRef]
7. Jonsson, I.; Kagstrom, B. Recursive blocked algorithms for solving triangular systems—Part I: One-sided
and coupled Sylvester-type matrix equations. ACM Trans. Math. Softw. 2002, 28, 392–415. [CrossRef]
8. Jonsson, I.; Kagstrom, B. Recursive blocked algorithms for solving triangular systems—Part II: Two-sided
and generalized Sylvester and Lyapunov matrix equations. ACM Trans. Math. Softw. 2002, 28, 416–435.
[CrossRef]
9. Kaabi, A.; Kerayechian, A.; Toutounian, F. A new version of successive approximations method for solving
Sylvester matrix equations. Appl. Math. Comput. 2007, 186, 638–648. [CrossRef]
10. Lin, Y.Q. Implicitly restarted global FOM and GMRES for nonsymmetric matrix equations and Sylvester
equations. Appl. Math. Comput. 2005, 167, 1004–1025. [CrossRef]
11. Kressner, D.; Sirkovic, P. Truncated low-rank methods for solving general linear matrix equations.
Numer. Linear Algebra Appl. 2015, 22, 564–583. [CrossRef]
12. Dehghan, M.; Shirilord, A. A generalized modified Hermitian and skew-Hermitian splitting (GMHSS)
method for solving complex Sylvester matrix equation. Appl. Math. Comput. 2019, 348, 632–651. [CrossRef]
13. Dehghan, M.; Shirilord, A. Solving complex Sylvester matrix equation by accelerated double-step scale
splitting (ADSS) method. Eng. Comput. 2019. [CrossRef]
14. Li, S.Y.; Shen, H.L.; Shao, X.H. PHSS iterative method for solving generalized Lyapunov equations. Mathematics
2019, 7, 38. [CrossRef]
15. Shen, H.L.; Li, Y.R.; Shao, X.H. The four-parameter PSS method for solving the Sylvester equation.
Mathematics 2019, 7, 105. [CrossRef]
16. Hajarian, M. Generalized conjugate direction algorithm for solving the general coupled matrix equations
over symmetric matrices. Numer. Algorithms 2016, 73, 591–609. [CrossRef]
17. Hajarian, M. Extending the CGLS algorithm for least squares solutions of the generalized Sylvester-transpose
matrix equations. J. Frankl. Inst. 2016, 353, 1168–1185. [CrossRef]
18. Dehghan, M.; Mohammadi-Arani, R. Generalized product-type methods based on Bi-conjugate gradient
(GPBiCG) for solving shifted linear systems. Comput. Appl. Math. 2017, 36, 1591–1606. [CrossRef]
19. Ding, F.; Chen, T. Gradient based iterative algorithms for solving a class of matrix equations. IEEE Trans.
Automat. Control 2005, 50, 1216–1221. [CrossRef]
Symmetry 2020, 12, 1831 15 of 15
20. Niu, Q.; Wang, X.; Lu, L.-Z. A relaxed gradient based algorithm for solving Sylvester equation. Asian J. Control
2011, 13, 461–464. [CrossRef]
21. Zhang, X.D.; Sheng, X.P. The relaxed gradient based iterative algorithm for the symmetric (skew symmetric)
solution of the Sylvester equation AX + XB = C. Math. Probl. Eng. 2017, 2017, 1624969. [CrossRef]
22. Xie, Y.J.; Ma, C.F. The accelerated gradient based iterative algorithm for solving a class of generalized
Sylvester-transpose matrix equation. Appl. Math. Comput. 2012, 218, 5620–5628. [CrossRef]
23. Ding, F.; Chen, T. Iterative least-squares solutions of coupled Sylvester matrix equations. Syst. Control Lett.
2005, 54, 95–107. [CrossRef]
24. Fan, W.; Gu, C.; Tian, Z. Jacobi-gradient iterative algorithms for Sylvester matrix equations. In Proceedings
of the 14th Conference of the International Linear Algebra Society, Shanghai University, Shanghai, China,
16–20 July 2007.
25. Tian, Z.; Tian, M.; Gu, C.; Hao, X. An accelerated Jacobi-gradient based iterative algorithm for solving
Sylvester matrix equations. Filomat 2017, 31, 2381–2390. [CrossRef]
26. Ding, F.; Liu, P.X.; Chen, T. Iterative solutions of the generalized Sylvester matrix equations by using the
hierarchical identification principle. Appl. Math. Comput. 2008, 197, 41–50. [CrossRef]
27. Horn, R.A.; Johnson, C.R. Topics in Matrix Analysis; Cambridge University Press: New York, NY, USA, 1991.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional
affiliations.
c 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Symmetry: Modified Jacobi-Gradient Iterative Method For Generalized Sylvester Matrix Equation

Uploaded by

Copyright:

Available Formats

Symmetry: Modified Jacobi-Gradient Iterative Method For Generalized Sylvester Matrix Equation

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Symmetry: Modified Jacobi-Gradient Iterative Method For Generalized Sylvester Matrix Equation

Uploaded by

Copyright:

Available Formats

SS symmetry

MSC: 65F45; 15A12; 15A60; 15A69

A1 XA2 + A3 XA4 = E (2)

where X ∈ Rm×n is unknown, and A1 , A2 , A3 , A4 , E are known constant matrices of appropriate

Symmetry 2020, 12, 1831; doi:10.3390/sym12111831 www.mdpi.com/journal/symmetry

P = A2 T ⊗ A1 + A4 T ⊗ A3 , x = vec( X ), and b = vec( E).

X1 (k) = X (k − 1) + µA1T [ E − A1 X (k − 1) A2 − A3 X (k − 1) A4 ] A2T ,

X1 (k) = X (k − 1) + µ( A1T A1 )−1 A1T [ E − A1 X (k − 1) A2 − A3 X (k − 1) A4 ] A2T ( A2 A2T )−1 ,

The condition number of A 6= 0 is defined by

Denote the real part of a complex number z by <(z).

2. A Modified Jacobi-Gradient Iterative Method for the Generalized Sylvester Equation

A1 XA2 + A3 XA4 = E. (3)

Write A1 = D1 + F1 , A2 = D2 + F2 , A3 = D3 + F3 and A4 = D4 + F4 , where D1 , D2 , D3 , D4 are the

In this case, the solution is given by vec X = P−1 vec E.

Define two matrices

N := E − F3 XD4 − D3 XF4 − F3 XF4 − A1 XA2 .

D1 XD2 = M and D3 XD4 = N (6)

so that the following norm-error functions are minimized:

L1 ( X ) := k D1 XD2 − Mk2F and L2 ( X ) := k D3 XD4 − N k2F . (7)

From the gradient formula

(1) (2) (3) (4)

The above discussion leads to the following Algorithm 1.

Algorithm 1: Modified Jacobi-gradient based iterative (MJGI) algorithm

3. Convergence Analysis of the Proposed Method

3.1. Convergence Criteria

(1) Then, (AS) holds if and only if ρ( Imn − µH ) < 1.

Proof. From Algorithm 1, we start with considering the error matrices

X̃ (k) = X (k) − X ∗ , X̃1 (k) = X1 (k) − X ∗ and X̃2 (k ) = X2 (k) − X ∗ .

vec X̃ (k) = vec X̃ (k − 1)

Let us denote the diagonal part of P by D( P). Indeed,

Thus, we arrive at a linear iterative process

vec X̃ (k) = [ Imn − µH ] vec X̃ (k − 1), (11)

where H = D( P) P. Hence, the following statements are equivalent:

(i) vec X̃ (k) → 0 for any initial value vec X̃ (0).

Indeed, since Imn − µH is a polynomial of H, we get

ρ( Imn − µH ) = max |1 − µλ|. (12)

µ(−2a j + µ( a2j + b2j )) < 0.

Thus, we arrive at two alternative conditions:

ρ( Imn − µH ) = max {|1 − µλmin ( H )|, |1 − µλmax ( H )|} . (15)

It follows that ρ( Imn − µH ) < 1 if and only if

0 < µλmin ( H ) < 2 and 0 < µλmax ( H ) < 2. (16)

So, λmin ( H ) and λmax ( H ) cannot be zero.

Case 1 If λmax ( H ) ≥ λmin ( H ) > 0, then the condition (16) is equivalent to

Case 2 If λmin ( H ) ≤ λmax ( H ) < 0, then the condition (16) is equivalent to

Case 3 If λmin ( H ) < 0 < λmax ( H ), then

3.2. Convergence Rate and Error Estimate

k X (k) − X ∗ k F = k vec X̃ (k)k F = k( Imn − µH ) vec X̃ (k − 1)k F

It follows inductively that for each k ∈ N,

k X (k) − X ∗ k F ≤ k Imn − µH k2k k X (0) − X ∗ k F . (18)

Proof. From the estimation (18), we have

k X (k) − X ∗ k F ≤ k Imn − µH k2k k X (0) − X ∗ k F → 0 as k → ∞.

k Imn − µH k2k k X (0) − X ∗ k F < e.

3.3. Optimal Parameter

In this case, the convergence rate is governed by

where κ denotes be condition number of H, and we have the following estimates:

k Imn − µH k2 = ρ( Imn − µH ). (25)

For convenience, denote a = λmin ( H ), b = λmax ( H ), and

f (µ) := ρ( Imn − µH ) = max {|1 − µa|, |1 − µb|} .

min k Imn − µH k2 = min f (µ).