An Algorithm For Quadratic Optimization With One Quadratic Constraint and Bounds On The Variables
An Algorithm For Quadratic Optimization With One Quadratic Constraint and Bounds On The Variables
1. Introduction
More often than not, optimization is used to find a solution to an inverse problem. Take as an
example the archetypal linear and discrete (or linearized and discretized) inverse problem:
Ax = d. Here, x is an n-vector with unknown model parameters and the m-vector d
contains the measured data. The (known) (m × n)-matrix A is an operator that maps the
model to the experimental data. The standard strategy to solve this problem is the following
optimization (generally attributed to Tikhonov 1963):
min kAx − dk22 + λO(x). (1)
x∈Rn
The first term in the object function is the square of the misfit and the second is the
regularization term; λ, λ > 0, is the regularization parameter. Regularization is needed in
the case in which the problem is ill-posed, which means that the minimizer of the misfit
is not unique or very sensitive to small variations in the data d. The regularizer O(x)
contains a priori information on the model x. A popular choice is the quadratic function
O(x) = kDxk22 , where D is a discrete approximation to the derivative operator; a small
O(x) thus guarantees smoothness. The definition of O(x) is subjective, and so is the choice
of λ. A popular practice is to vary λ until the solution satisfies some subjective criteria.
The indefiniteness of the regularization parameter λ can be removed by another piece of
prior information: a constraint to the misfit (Phillips 1962) or a constraint to the regularizer
† Now at Shell Research, PO Box 60, NL-2280 AB Rijswijk, The Netherlands. E-mail address:
[email protected]
0266-5611/98/040893+09$19.50
c 1998 IOP Publishing Ltd 893
894 G C Fehmers et al
(Ivanov 1962). Both strategies lead to a constrained optimization problem, one being the
dual of the other. A constraint to the misfit leads to the so-called discrepancy or error
principle:
The inverse problem has been reduced to the following constrained optimization problem:
The feasible set S is defined by n bounds on the variables to ensure positivity (x > 0),
plus one quadratic inequality constraint (kAx − dk2 6 E), which is provided by the
experiment. The solution can be interpreted as follows: of all possible states x that are
positive everywhere and that satisfy the experiment to within the measurement error, it is
the one that is most likely in the a priori sense.
Next we make an important assumption, namely that the experiment is specific enough
to add to the prior knowledge. This means that a point most likely in the a priori sense,
i.e. where O(x) is minimal, should not satisfy the experiment to within the measurement
error:
If this were not true, the experiment would not contribute to the solution. Instead, the
solution would be completely defined by the a priori information, a pathological situation.
From (3), it follows that the minimum of O(x) does not satisfy kAx − dk2 6 E. This
implies that this constraint is active. (It also means that it is irrelevant whether an inequality
or an equality constraint is imposed, as mentioned in the introduction.)
To solve the optimization problem, this paper uses the method of the Lagrange
multipliers, which introduces a Lagrange multiplier for every constraint and produces a
set of conditions (the Kuhn–Tucker (KT) conditions) that the solution must satisfy. These
KT conditions are a system of equations and inequalities; a solution to this system is a
solution to the constrained optimization problem. In general, the complexity of a constrained
optimization problem increases with the number of constraints. To be more precise, the
complexity increases with the number of active constraints, because inactive constraints
can be ignored. It is clear that knowledge about which constraint is active and which
is not simplifies the constrained optimization. Above, we have seen that the constraint
kAx − dk2 6 E is active.
We rewrite the constrained optimization problem into the following form. Consider
two continuous quadratic functions, O(x) and g(x), the object function and the constraint
896 G C Fehmers et al
function respectively:
O(x) : Rn → R, O(x) = xt Bx + 2bt x,
g(x) : R → R,
n
g(x) = xt Cx + 2ct x + e.
Here, e is a scalar, x, b, c are real n-vectors and B and C are real, symmetric and positive
semidefinite (n × n) matrices; t indicates the transpose. In our case: B = D t D, b = 0,
C = At A, c = −dt , e = dt d − E 2 . We define the constrained optimization problem and
the feasible set S by
min O(x), S = {x ∈ Rn |g(x) 6 0 ∧ x > 0}. (4)
x∈S
The Hessian matrices of the functions O and g are 2B and 2C respectively. Because
these matrices are positive semidefinite, the functions f and g are convex. As g is a
convex function, the set {x ∈ Rn |g(x) 6 0} is convex, as is the set {x ∈ Rn |x > 0}. This
guarantees that their intersection, S, is a convex set as well. S is also a closed set.
We will find a solution to the constrained optimization problem (4) if: (1) the feasible
set S is not empty and (2) the kernels of B and C only have the nilvector in common. The
second condition is written as
ker(B) ∩ ker(C) = {0}. (5)
This condition is quite logical. It merely states that the subspace of solution space that goes
unnoticed in the experiment, ker(C), does not overlap with the subspace towards which the
prior information is indifferent, ker(B). In other words, there shall be no vectors that are
both invisible to the experiment and to the a priori information. Similarly, the solution is
stable if there exists no vector z for which both kBzk and kCzk are very small.
In the literature, [B + λC] is referred to as a matrix pencil (e.g. Gantmacher 1959, Parlett
1980). Our pencil is symmetric and, by condition (5), regular. For regular pencils, it is
possible to derive a closed expression for the inverse [B + λC]−1 . One method is based on
the generalized eigenproblem, as we now show. Another method is based on the GSVD,
to which we will return at the end of the section.
Because B and C are symmetric and positive semidefinite and because of (5), matrix
B + C is symmetric and positive definite. Therefore matrix [B + C]−1/2 exists and is
symmetric and positive definite. We write
[B + λC]−1 = [B + C]−1/2 [In + (λ − 1)[B + C]−1/2 C[B + C]−1/2 ]−1 [B + C]−1/2 ,
where In is the identity matrix of order n. We are going to diagonalize the matrix between
brackets:
In + (λ − 1)[B + C]−1/2 C[B + C]−1/2 . (11)
To this end, we consider the eigenproblem
[B + C]−1/2 C[B + C]−1/2 |ui i = µi |ui i, (12)
where we use Dirac notation. As the matrix in (12) is symmetric and positive semidefinite,
the eigenvalues satisfy
µi > 0, i = 1, . . . , n. (13)
As the matrix in (12) is symmetric, the spectral theorem states that |ui i is a complete set
of normalized orthogonal eigenvectors:
X n
hui |uj i = δij and |ui ihui | = In .
i=1
Let
|yi i = [B + C]−1/2 |ui i,
then (12) becomes the generalized eigenproblem
C|yi i = µi [B + C]|yi i. (14)
The final solution will be given in terms of the generalized eigenvectors |yi i and eigenvalues
µi . Therefore, the solution of the generalized eigenproblem forms the heart of the algorithm.
From equation (14) it is not difficult to see that there are no eigenvalues µi larger than 1,
this gives with (13)
0 6 µi 6 1 i = 1, . . . , n
and
µi = 0 ←→ |yi i ∈ ker(C),
µi = 1 ←→ |yi i ∈ ker(B).
We also know that
hyi |C|yj i = µi hyi |B + C|yj i = µi hui |uj i = µi δij . (15)
The normalized orthogonal eigenvectors and the eigenvalues of the matrix in (11), are |ui i
and 1 + (λ − 1)µi respectively. Therefore
Xn
|ui ihui |
[In + (λ − 1)[B + C]−1/2 C[B + C]−1/2 ]−1 = (16)
i=1
1 + (λ − 1)µi
898 G C Fehmers et al
The effort of the preceding section has yielded a closed expression (18) for the solution as a
function of the Lagrange multipliers λ and l. Now we need to know the Lagrange multipliers.
This section shows how to compute the Lagrange multipliers from the constraints. We will
proceed in two steps. In the first step, we only consider the quadratic constraint g(x) 6 0.
In the second step, the positivity bounds x > 0 are also included. The inclusion of positivity
is really an iterative process that involves both steps. The solution is found in the space of
the Lagrange multipliers. Because there are relatively few active constraints, this process
is much faster than finding the solution directly in solution space.
Statement (23) follows from (21) and (24) follows from (22).
rest of the algorithm takes only a few minutes, because the number of active constraints p is
much smaller than n. Typically, 100 < p < 200. When a series of problems from the same
experiment must be solved, such as in a tomographic imaging series, the computationally
expensive part needs to be done only once.
Acknowledgment
The work presented here was financially supported by the Netherlands Foundation for
Scientific Research (NWO—Nederlandse Organisatie voor Wetenschappelijk Onderzoek).
References
Bertero M, De Mol C and Pike E R 1988 Linear inverse problems with discrete data: II. Stability and regularization
Inverse Problems 4 573–94
Fehmers G C, Kamp L P J and Sluijter F W 1998 A model-independent algorithm for ionospheric tomography: I.
Theory and tests Radio Sci. 33 149–63
Fletcher R 1993 Practical Methods of Optimization 2nd edn (Chichester: Wiley)
Gantmacher F R 1959 The Theory of Matrices (New York: Chelsea)
Gill P E, Murray W and Wright M H 1984 Practical Optimization 4th edn (London: Academic)
Golub G H and Van Loan C F 1989 Matrix Computations 2nd edn (London: Johns Hopkins University Press)
Gull S F and Daniell G J 1978 Image reconstruction from incomplete and noisy data Nature 272 686–90
Hansen P C 1992 Numerical tools for analysis and solution of Fredholm integral equations of the first kind Inverse
Problems 8 849–72
Ingesson L C, Alper B, Chen H, Edwards A W, Fehmers G C, Fuchs J C, Giannella R, Gill R D, Lauro-Taroni
L and Romanelli M 1998 Soft x-ray tomography during ELMs and impurity injection in JET Nucl. Fusion
submitted
Ivanov V K 1962 On linear problems which are not well posed Sov. Math. Dokl. 3 981–3
Oldenburg D W 1994 Practical strategies for the solution of large-scale electromagnetic inverse problems Radio
Sci. 29 1081–99
Parlett B N 1980 The Symmetric Eigenvalue Problem (Englewood Cliffs, NJ: Prentice-Hall)
Phillips D L 1962 A technique for the numerical solution of certain integral equations of the first kind J. Assoc.
Comput. Mach. 9 84–97
Tikhonov A N 1963 Solution of incorrectly formulated problems and the regularization method Sov. Math. Dokl.
4 1035–8
Turchin V F, Kozlov V P and Malkevich M S 1971 The use of mathematical-statistics methods in the solution of
incorrectly posed problems Sov. Phys.–Usp. 13 681–703