Eigenvalopt
Eigenvalopt
1.1 Introduction
1
2 Vyacheslav Kungurtsev, Wim Michiels, and Moritz Diehl
m
X
0
v (t) = Aj (x)v(t − τj ).
j=0
det(Λ(λ; x)) = 0,
with,
m
X
Λ(λ; x) = λI − A0 (x) − Aj (x)e−λτj .
j=1
The number of eigenvalues in this case is generally infinite, but within any
right half-plane the number of eigenvalues is finite [17]. We let F (x) be the
infinitesimal generator corresponding to the solution operator of the delay
system, and the spectrum as σ(F (x)).
As an illustrative special case, consider a simpler problem of optimizing
the spectrum of a linear system controlled with static, undelayed output
feedback, where the operator F (x) reduces to the matrix,
F (x) = A + BXC,
where A is the open-loop matrix for the system, B the input matrix and C
the output matrix, and X is formed by arranging the components of x into
a matrix of the appropriate dimensions. This presents a linear eigenvalue
problem, with a finite spectrum.
The problem of interest can be written in the form,
The spectral abscissa corresponds to the largest real part of the eigenvalues
of F (x).
The properties of the spectrum of a matrix subject to parameters is an in-
volved topic, for an early work, see [1]. A more thorough analysis with respect
to the spectral abscissa in particular was presented in [4]. For recent work see,
for instance [5, 14]. An important fact that permits a lot of the subsequent
analysis is that the spectrum {λ0 (x), λ1 (x), ..., λN −1 (x)} of a matrix F (x) is
a continuous function of x. Typically, local minimizers correspond to points x
at which some of the eigenvalues coalesce, i.e., Re(λ0 (x)) = Re(λ1 (x)) = ....
In [22] it was shown, however, that although for symmetric F (x), f (x) is con-
vex, in the nonsymmetric case f (x) is not even Lipschitz. If for all x, all of the
Title Suppressed Due to Excessive Length 3
active eigenvalues (i.e., λi such that Re(λ0 (x)) = Re(λi (x)) were simple, then
f (x) would correspond to the maximum of a set of smooth surfaces. However,
this is typically not the case. Thus, the optimization problem is difficult to
solve because it is nonconvex, nonsmooth, and typically non-Lipschitz.
It can be observed, however, that the extensive variational analysis of
the spectral abscissa has been performed for matrix eigenvalue optimization,
rather than time delays. The main difference lies in the fact that in the generic
case there are infinitely many eigenvalues. Thus, at this point, we can expect
that optimizing the spectrum of a nonlinear eigenvalue problem should be
at least as difficult as for matrices, and so all of the variational properties
presenting challenges extend appropriately.
We present the plot of a two-dimensional problem in Figure 1.1. In this
example, first given in [23], F (x) = A + BK, with,
0.1 −0.03 0.2 −1 x1
1
A = 0.2 0.05 0.01 , B = −2 , K T = x2
2
−0.06 0.2 0.07 1 1.4
Notice that all of the features of α(F (x)) we describe above, nonconvexity,
nonsmoothness and non-Lipchitz behavior are evident in the figure.
ces in [20, 21], this approach is based on the realization that the problem
minx∈Rn α(F (x)) can be rewritten as,
minγ∈R,x∈Rn γ,
(1.2)
subject to γ ≥ Re(λi (F (x))) for all i.
minγ∈R,x∈Rn γ,
subject to γ ≥ Re(λi (F (x))) for all i such that Reλi (F (x)) > λc ,
where λc separated the plane that we restrict our attention to, and will
depend on the number of eigenvalue surfaces we want/are able to incor-
porate and the location of these eigenvalues. We will order the spectrum
as Re(λ0 (F (x))) ≥ Re(λ1 (F (x))) ≥ Re(λ2 (F (x))) ≥ ... ≥ Re(λNc (F (x)))
where Nc + 1 is the number of eigenvalues that fall to the right of λc .
In the case that the eigenvalues of F (x) are isolated and simple, the
gradient and Hessian of λi (F (x)) with respect to x is well defined for all
λi (F (x)) ∈ σ(F (x)). In this case, the objective function is the maximum of a
set of smooth surfaces, and is thus piece-wise smooth. Solving the problem by
successive approximation of each surface is standard, and the associated con-
vergence theory in [20,21] proves that the procedure outlined below converges
to the solution.
Of course, in the general setting, eigenvalues need not be simple and iso-
lated. The number of points at which the objective function α(F (x)) in (1.1)
is nonsmooth is of measure zero in the Lebesgue space Rn . This implies that
for a.e. x, the function α(F (x)) is a locally smooth surface. This surface
corresponds to the value of λ0 (F (x)) as a function of x. This implies that
the algorithm (and computing linearizations) is well-defined at every point.
However, as minimizers tend to be points of nonsmooth and non-Lipschitz
behavior, such a scheme is no longer certifiably convergent. In practice, how-
ever, following some modifications, we will see that it still performs well.
For now, consider the simple case that all eigenvalues λi (F (x)) are simple
and isolated. It can be shown that, in the case where each Ai (x) depends
smoothly on x and where the eigenvalue has multiplicity 1, the derivative of
the surface corresponding to each eigenvalue as well as ∇2xx λi (F (x)) can be
calculated from the formulas [12, 16, 17].
Pm ∂Aj −λi τj
u∗i ∂A∂x +
0
j=1 ∂x e vi
∇x λ i = Pm ,
u∗i I + j=1 τj e−λτj Aj vi
6 Vyacheslav Kungurtsev, Wim Michiels, and Moritz Diehl
where ui and vi are the left and right eigenvectors of F (x) corresponding to
eigenvalue i, u∗ corresponds to the conjugate of u and the second-derivatives
may be calculated explicitly by
u∗ 2 2 2
i (∇xλ Λ(λi ,x)⊗∇x λi +∇xx Λ(λi ,x)+∇λλ Λ(λi ,x)⊗(∇x λi )⊗(∇x λi ))vi
∇2xx λi (x) = − u∗ ∇λ Λ(λi ,x)vi
i
u∗ (2∇x Λ(λi ,x)+2∇λ Λ(λi ,x)⊗∇x λi )∇x vi
+ i u∗ ,
i ∇λ Λ(λi ,x)vi
and if this does not hold we set ∆k+1 = γ1 ∆k , where γ1 is a constant satis-
fying γ1 ∈ (0, 1), and resolve the subproblem.
If (1.7) holds, we follow the mixed trust-region/line-search procedure pre-
sented by Gertz [10], in which a backtracking line search reduces the size of
the step t until decrease is achieved (α(F (xk + t∆x)) < α(F (xk ))), and the
next trust-region radius corresponds to t||∆x||.
γ2 ∆k if α(F ((xk + ∆x)) < α(F (xk ))
∆k+1 = (1.8)
t||∆x|| otherwise,
where γ2 is a constant satisfying γ > 1.
We update the trust-region simply by increasing it if we achieve descent,
and decreasing it otherwise. For consistency with convergence theory [6], we
would enforce sufficient decrease conditions with respect to predicted (from
the quadratic approximation) and actual decrease. However, since lax criteria
of acceptance (e.g., with a small constant multiplying the predicted-actual
decrease ratio) of the step is practically equivalent to this condition, we pro-
ceed as in the line-search criteria for the gradient sampling method [3] to just
enforce descent.
If we omit the second order term Hk , then the algorithm becomes a sequen-
tial linear programming (SLP) method. With the trust-region, the solution
is always bounded.
We found that in the nonsymmetric case, the basic SL/QP algorithm would
frequently stall at non-optimal points. Recall that the reliability of the al-
gorithm depended on some strong assumptions on the problem. To give a
generic geometric picture of the situation for which this occurs, consider a
”valley”, or n − 1 dimensional hypersurface in Rn at which ∇λi (F (x)) is un-
defined. It can happen that across the n − 1 dimensional manifold of x on
which this occurs, the derivatives of λi (F (x)) jump discontinuously.
Locally, the directional derivative of α(F (x)) is steeper towards the valley
than parallel along it, so a local approximation that regards only the eigen-
value surfaces at a point on one side of the valley will result in the step of
steepest decrease being in this direction. Since the surface on the other side
of the valley is not accounted for on the original side, this is not incorporated
directly into the subproblem. We illustrate this scenario in Figure 1.2.
8 Vyacheslav Kungurtsev, Wim Michiels, and Moritz Diehl
min∆x,∆γ ∆γ,
subject to ∆γ + α(F (xk )) ≥ Re(λi (F (xk )))
+Re(∇x λi (F (xk )))T ∆x, i ∈ {0, ..., Nk }
∆γ + α(F (xk )) ≥ Re(λ(i) (F (x(i) )))
+Re(∇x λ(i) (F (x(i) )))T (xk + ∆x − x(i) ), i ∈ Mk
||∆x||∞ ≤ ∆k .
(1.10)
for ∆xk .
9: Calculate {λi (F (xk + ∆xk ))} and {∇x λi (F (xk + ∆xk ))} for i ∈ {0, .., Nk }
10: if α(F (xk + ∆xk )) < α(F (xk )) then
11: Set xk+1 ← xk
12: Set ∆k+1 ← γ∆k .
13: else
14: Store {xk + ∆xk , Re(λ0 (F (xk + ∆xk ))), ∇x Re(λ0 (F (xk + ∆xk )))}
15: in Mk+1 .
16: Find t such that α(F (xk + t∆xk )) < α(F (xk )).
17: Set xk+1 ← xk + t∆xk .
18: Set ∆k+1 ← t||∆xk ||.
19: end if
20: Set k ← k + 1.
21: Determine Nk . Typically, set Nk = N , the size of F (x).
22: Calculate all {λi (F (xk ))} and {∇x λi (F (xk ))} for i ∈ {0, .., N0 }.
23: end while
24: Add the last point (xf , α(F (xf ))) to F .
25: end for
return {xf , α(F (xf ))} corresponding to the lowest value of α(F (xf )) in F .
Finally, we note that since this problem is nonconvex, there can be multiple
local minima, possibly necessitating the use of global optimization strategies.
In our experiments we have found this to be problem dependent, i.e., for
some systems there are many local minima, but not others. Given that the
objective function is not a closed form function, determinstic strategies for
global optimization would be impossible to implement. In our implemen-
tation, we use ten random starting points, initialized as a random normal
variable centered at zero, and select the lowest minimizer out of the ten runs.
In practice, for many problems, it is expected that the parameters should
lie in some bounded region, permitting the use of more probabilistically so-
phisticated strategies [25]. In addition, in many applications, only a point at
which the system is stable, e.g., the spectral abscissa is below zero, is needed
10 Vyacheslav Kungurtsev, Wim Michiels, and Moritz Diehl
rather than the absolute global minimizer, and so there would be some laxity
in the treatment of the existence of multiple local minimizers.
For all solvers, we used a stopping tolerance of 1e-4, indicating that the
algorithms terminate when the (inf) norm of the previous step was smaller
than 1e-4. The SL/QP algorithms were coded in MATLAB, with all tests run
using MATLAB version 2013a and were performed on an Intel Core 2.2 GHz
×8 running Ubuntu 14.04. For all algorithms we use the same procedure of
using ten random starting points, specifically initializing a point by a normal
distribution centered at zero, and then picking the best solution (the one with
the lowest objective value) of ten runs.
We list the parameter and initial values we use in our implementations
of SL/QP in Table 1.1 0 < γ1 < 1, γ2 > 1, ∆m > 0, δm > 0, and S ∈ N.
We denote kmax the maximum number of iterations, LSkmax the maximum
number of line-search steps, and η the backtracking contraction parameter.
and
−0.1
B(x) = −0.2 x1 x2 x3 .
0.1
We present the sum of the results of the values and times in Table 1.2.
Title Suppressed Due to Excessive Length 11
Table 1.2: Mean (standard deviation) for values and times for SLP and SQP (out of 500
sample runs). Value for each solver for each run is taken as the best of 10 random starting
points. Time is the total clock time taken to perform the ten runs.
value time
SLP -0.081 (0.053) 4.6 (1.1)
SQP -0.088 (0.055) 5.3 (1.6)
The best value was found to be -0.239, at x = (−0.21, 0.074, 1.38) and
−0.129 at x = (−0.036, 0.67, 0.94) for SLP and SQP respectively. For SLP,
16% of the initial random starting points corresponded to a stable (negative)
value of the spectral abscissa, and 25% of the final iterations did, among all
of the trials. For SQP these numbers were 10% and 23%, respectively.
The next example is given below,
with,
T
xh,set (t) = K1 K2 K3 K4 K5 xh (t) xa (t) xd (t) xc (t) xe (t) .
The results comparing SLP and SQP, which are qualitatively similar as in
the first example, are given in Table 1.3.
Table 1.3: Mean (standard deviation) for values and times for BFGS with and without an
additional gradient sampling phase, SLP, and SQP (out of 500 sample runs).
value time
SLP -0.083 (0.0062) 83.5 (140)
SQP -0.088 (0.0103) 75.5 (76)
The best value was found to be -0.015, at k = (2.2, −12, −7.5, −6.9, 0.35)
and −0.016 at k = (−0.77, −3.0, −3.6, −4.2, 1.4) for SLP and SQP respec-
tively. For SLP, 2% of the initial random starting points corresponded to a
stable (negative) value of the spectral abscissa, and 19% of the final itera-
tions did, among all of the trials. For SQP these numbers were 5% and 11%,
respectively.
12 Vyacheslav Kungurtsev, Wim Michiels, and Moritz Diehl
It appears as though SQP and SLP perform similarly, both in terms of final
objective value and time. In general it appears that, given enough random
starting points, the algorithms are successful for a fair number of trials in
obtaining a stable controller.
1.4 Conclusion
1.5 Acknowledgements
References