0% found this document useful (0 votes)
62 views10 pages

A Geometric Approach To Direct Minimization

The document presents a geometric approach to direct minimization (GDM) for optimizing a function of orthonormal orbitals in quantum chemistry. GDM employs sequential unitary transformations and utilizes the Broyden-Fletcher-Goldfarb-Shanno (BFGS) method for convergence, demonstrating robustness and effectiveness compared to traditional methods like direct inversion in the iterative subspace (DIIS). The results indicate that while GDM converges rapidly for main group compounds, it is slower for transition metal systems, suggesting a hybrid approach combining both methods for improved performance.

Uploaded by

Zeyi Zhang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views10 pages

A Geometric Approach To Direct Minimization

The document presents a geometric approach to direct minimization (GDM) for optimizing a function of orthonormal orbitals in quantum chemistry. GDM employs sequential unitary transformations and utilizes the Broyden-Fletcher-Goldfarb-Shanno (BFGS) method for convergence, demonstrating robustness and effectiveness compared to traditional methods like direct inversion in the iterative subspace (DIIS). The results indicate that while GDM converges rapidly for main group compounds, it is slower for transition metal systems, suggesting a hybrid approach combining both methods for improved performance.

Uploaded by

Zeyi Zhang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Molecular Physics

ISSN: 0026-8976 (Print) 1362-3028 (Online) Journal homepage: https://www.tandfonline.com/loi/tmph20

A geometric approach to direct minimization

TROY VAN VOORHIS & MARTIN HEAD-GORDON

To cite this article: TROY VAN VOORHIS & MARTIN HEAD-GORDON (2002) A geometric
approach to direct minimization, Molecular Physics, 100:11, 1713-1721, DOI:
10.1080/00268970110103642

To link to this article: https://doi.org/10.1080/00268970110103642

Published online: 01 Dec 2009.

Submit your article to this journal

Article views: 947

View related articles

Citing articles: 24 View citing articles

Full Terms & Conditions of access and use can be found at


https://www.tandfonline.com/action/journalInformation?journalCode=tmph20
MOLECULAR
PHYSICS,
2002, VOL. 100, No. 11, 1713-1721 0
+ Taylor & Francis
Taylor &Francis Group

A geometric approach to direct minimization


TROY VAN VOORHIS and MARTIN HEAD-GORDON*
Department of Chemistry, University of California, Berkeley, CA 94720, USA

(Received 2 June 2001; accepted 9 October 2001)

The approach presented, geometric direct minimization (GDM), is derived from purely geo-
metrical arguments, and is designed to minimize a function of a set of orthonormal orbitals.
The optimization steps consist of sequential unitary transformations of the orbitals, and
convergence is accelerated using the Broyden-Fletcher-Goldfarb-Shanno (BFGS) approach
in the iterative subspace, together with a diagonal approximation to the Hessian for the
remaining degrees of freedom. The approach is tested by implementing the solution of the
self-consistent field (SCF) equations and comparing results with the standard direct inversion
in the iterative subspace (DIIS) method. It is found that GDM is very robust and converges in
every system studied, including several cases in which D I E fails to find a solution. For main
group compounds, GDM convergence is nearly as rapid as DIIS, whereas for transition metal-
containing systems we find that GDM is significantly slower than DIIS. A hybrid procedure
where DIIS is used for the first several iterations and GDM is used thereafter is found to
provide a robust solution for transition metal-containing systems.

1. Introduction proved nearly as useful as the DIIS ansatz. Direct mini-


The optimization of one-particle orbitals in quantum mization algorithms exploiting Hessian-like matrices
chemistry is a significant undertaking because, although [ll-131 generally converge much faster than even
often one can readily derive a set of equations that the DIIS, but in most cases inverting the Hessian to
optimal orbitals must satisfy, usually it is not so simple obtain the Newton-Raphson step is computationally
to obtain a rapid solution to these equations. The DIIS expensive. In this work, we present a direct minimiza-
method [l, 21 has proved an extremely effective solution tion algorithm that requires only gradient information,
to this problem for both SCF and correlated wavefunc- converges at essentially the same rate as DIIS for ‘easy’
tions [3-71. With DIIS, one determines the linear com- cases, and is capable of converging cases where DIIS
bination of previous gradients that has minimum length fails.
and then uses this information to extrapolate a new step
that should have a small gradient, and therefore be 2. Theory
nearly stationary. DIIS is certainly the most used con- Our treatment is based on the geometric approach to
vergence algorithm in quantum chemistry today, and orthogonality constraints presented by Edelman et al.
often it is extraordinary how well it works. However, [14]. Their formalism is very general, and we review
it is not without weaknesses. The central difficulty is only the crucial results here; the interested reader is
that in the case of near-degeneracies or other patholo- directed to the original article for more information.
gies one sometimes finds that the DIIS iterations simply For clarity, we shall specialize to the case of SCF orbital
fail to converge to any solution, which clearly is frus- optimization, but it should be clear that generalizations
trating for the user. This deficiency stems from the fact to other methods that require orbital optimization can
that DIIS is an extrapolation procedure rather than a be made readily.
minimization algorithm; even a simple conjugate gra-
dient approach is guaranteed eventually to converge to 2.1. The SCF equations
a minimum. Unfortunately, what one really desires is a Begin with a set of N orthonormal orbitals {q$} that
method that is both robust and rapidly convergent, and are written as a linear combination of a set of N (not
conjugate gradient alone requires far too many itera- necessarily orthogonal) basis functions { xp},
tions in practice. Other gradient-based direct minimiza-
tion algorithms have been proposed [8-101 but have not

* Author for correspondence. e-mail: [email protected]. In SCF theory, the objects of fundamental interest are
edu the orthonormal orbitals

Molecular Physics ISSN 002&8976 print/ISSN 1362-3028 online 02002 Taylor & Francis Ltd
http://www.tandf.co.uk/journals
DOI: 10.1080/002689701I0103642
1714 T. Van Voorhis and M. Head-Gordon

excludes certain generalized valence bond [ 161 and opti-


mized effective potential methods [ 171 but includes most
p= I
coupled-cluster approaches [7, 181 and many active-
that minimize an (as yet unspecified) energy functional. space correlation models [6, 191. Finally, often in what
This energy actually only depends on the first M of these follows it is useful to exploit the distinction between
orthogonal functions; the remaining orbitals contain no ‘occupied’ and ‘virtual’ subspaces and depict matrices
electrons and therefore have no effect on the energy. The in terms of their occupied+ccupied (oo),virtual-virtual
presence of N - M additional ‘virtual’ orbitals is (vv) occupied-virtual (ov) and virtual-occupied (vo)
convenient, since it turns what would be a rectangular blocks. For example,
coefficient matrix C into a square matrix.
Since the sets {q$} and { q ! ~ are
~ } both orthonormal, we (7)
can write the optimal set of coefficients as a unitary
transformation of the initial set:
2.2. A geometric approach
D = CU, (3) If we consider the space of all unitary matrices as a
where U is a unitary matrix. Thus, we seek to minimize manifold embedded in the larger space of all matrices,
the energy E(U, a) as a function of the unitary matrix U, we see that the constraint (4) forces the manifold to be
holding fixed any other degrees of freedom, a, that the curved. This can be illustrated readily by considering the
energy may depend upon (e.g. nuclear degrees of analogous problem in 3 dimensions of minimizing a
freedom, coupled-cluster amplitudes). One can obtain function subject to the constraint
Hartree-Fock or Kohn-Sham theory from this energy,
depending on how E ( U , a ) is defined. Since none of X’X = 1. (8)
what follows depends on the distinction between these Of course, the manifold of possible solutions in this case
forms for the energy, we shall leave the definition of is the unit sphere, which is a curved surface.
E(U, a) intentionally vague so that our SCF algorithm +
If we make an infinitesimal change U + U 6U and
will apply without modification to both Hartree-Fock apply the unitarity constraint (4)to the new matrix, we
and Kohn-Sham approaches. We can enforce the uni- find that the variation 6U must satisfy the equation
tarity constraint U’6U + 6U+U= 0 (9)
utu = 1, (4) where terms second order in the infinitesimal have been
where 1 represents the unit matrix, by defining a matrix neglected. Notice that this constraint is linear in 6U;
of Lagrange multipliers t and finding the stationary thus the subspace of allowable infinitesimal variations
points of the associated functional is a linear space (the tangent space) that has no curva-
ture. To first order, movement in this space is identical
k(U, a) = E(U, a) - Tr[e. (UtU - l)]. (5) to movement on our curved manifold. Thus, the tangent
The stationary equations give the unitarity condition on space is the flat space that is locally most like our curved
U, (equation (4)), and manifold, and it plays a central role in the translation of
any algorithm designed to work on a flat manifold into
aE one that works in a curved space.
au - F(U) = E . Ut
-=
Since the SCF energy is invariant to transformations
where we have defined the Fock matrix, F, which is in of the occupied and virtual subspaces amongst them-
general a nonlinear function of U . Equation (6) is the selves, changes in the energy must therefore be due to
familiar Roothaan equation [15] that leads to an itera- ov and vo transformations alone. In this context, we can
tive procedure where one diagonalizes F to obtain a new divide the space of tangent vectors into ‘horizontal’
estimate for U, which is then used to build a new F, vectors (H) that do not affect the energy and ‘vertical’
which may be diagonalized . , . This is one particular vectors (V) that can change the energy (here our
formulation of the SCF optimization problem we wish convention for ‘vertical’ and ‘horizontal’ vectors is
opposite to the standard convention),
to solve.
We note here that for both DFT and H F the energy
does not depend on the M occupied orbitals individu-
H-(”O
0 Hvv
O ), vq 0 vov
vvo
). (10)
ally, but only on the space that they span. A manifold
with this type of invariance is called a Grassmann mani- Clearly, in order to optimize the SCF orbitals we need
fold, and many of the results we shall present depend deal only with the vertical space. In the special case
upon the existence of such an invariant subspace. This U = 1, the tangent space condition (9) implies that
Geometric direct minimization 1715

variations in U must be skew-symmetric. Thus, in this Note that this requires one to perform a sequence of
important case, the vertical vectors are defined by their unitary transformations rather than attempting to write
ov block alone, the final set of orbital coefficients as a single unitary
transformation of the initial set. That is, the final set
of orbitals is written
c = CoedoedleA2. . . edn, (16)
The vertical tangent vectors allow us to create geode-
where each A i is a scaled gradient rather than
sics. A geodesic is the shortest path in our curved mani-
fold that connects two given points [20]. It is also the C = C o eA ,
(1717
'straightest' curve available on the surface, since addi-
and where A must be determined. Clearly the former
tional curvature would tend to lengthen the path.
approach can be viewed as a special case of the latter
Hence, on a curved surface, one discards the notion of
where the orbitals are 're-set' at each iteration. This
a straight line and replaces it with that of a geodesic. For
amounts to shifting the origin of our reference frame
SCF the geodesics depend only on the vertical vectors;
to the current set of orbitals as opposed to referencing
horizontal moves do not affect the energy, and thus any
it to the arbitrary initial orbitals.
transformation in the horizontal space will tend to unne-
cessarily Iengthen the path to the minimum. Given a
vertical vector V, it may be shown [ 141 that the geodesic 2.3. Approximate Hessian approach
initially tangent to V may be written explicitly as The use of the gradient as the step direction, as in
steepest descent, is far from optimal in practice; to
U(V) = e". (12) improve upon this step, one must utilize second-deriva-
tive information. Approximate Hessian methods accom-
Therefore, a given vertical tangent vector V defines the plish this by employing the Newton-Raphson-like step
initial direction of a unique geodesic.
If we define X = V,,V,, and Y = V,,V,, it is easily S = -BG, (18)
verified that [21] where G is the gradient and B is an approximation to the
inverse Hessian constructed from vectors collected in
U(V) =
cosx'J2 x-'J2sin x'J'v,, previous iterations. In a curved manifold, one needs to
v,,x- J2 sin X' J2 cosY'J2 be sure that the previous vectors are contained in the
current tangent space before constructing the approxi-
This allows the geodesic to be evaluated in O(N3)time, mate Hessian; otherwise the resulting step direction
with the key step being the diagonalizations of X and Y. might not be tangent to our surface, making it impos-
We can now formulate steepest descent on the Grass- sible to construct the relevant geodesic. It is at this point
mann manifold: that the formulation of 're-setting' the orbitals at each
(1) Obtain an initial set of (orthogonal) orbital co- iteration becomes useful. Specifically, since re-setting the
efficients C o . orbitals amounts to setting U = 1 at the beginning of
(2) Compute the gradient, each iteration, we have at every iteration that the tan-
gent space is of the form of equation (1 l), and therefore
= - - =dFE-d U
G = -dE dU the tangent vectors from previous iterations are guaran-
dV d U d V dV teed to be in the tangent space at the current point. If we
evaluated with the current set of orbitals. This is had chosen a fixed frame, the tangent spaces could not
a vertical vector in the tangent space. possibly be the same because this would imply that our
(3) Minimize the energy along the geodesic defined surface was flat.
by G . That is, minimize Now, the fact that the tangent spaces of subsequent
iterations are the same does not necessarily imply that
E(y) = E(eYG,a ) (14) individual vectors in these spaces may be identified;
there could be some significant internal shuffling of tan-
as a function of y. Let yo denote the optimal gent vectors between iterations that does not affect the
value of y. space they span. Rigorously, one should obtain vectors
(4) Update the orbitals using in the current frame by transporting the previous vectors
along the relevant geodesic, making sure to keep the
ci+I = CieYoG. (15)
angle between each vector and the geodesic fixed. Such
(5) If convergence has not been achieved, return to a process is called parallel transport, and we may denote
step 2. the vertical vector V after parallel transport by TV. For
1716 T. Van Voorhis and M. Head-Gordon

a Grassmann manifold, Edelman et a/.[14] showed that on the size of the system, the construction and storage of
the parallel transport of V along the geodesic generated the approximate Hessian is never prohibitive.
by A is given by Unfortunately, BFGS by itself does not accelerate
convergence enough to be competitive with DIIS. This
T V = e A V. (19) is because the orbital rotation space has many dimen-
However, the unitary transformation ed is absorbed into sions and therefore BFGS requires many iterations in
the definition of C when we re-set our orbitals and there- order to build up enough information about the Hes-
fore the parallel transported vector TV in the rotated sian. It would be ideal if we could incorporate an
frame is identical to the original vector V . We stress approximate Hessian for the degrees of freedom not
that this is a special feature of the Grassmann manifold spanned by previous iterations and simply update this
in particular, and methods with invariant subspaces in Hessian using BFGS as more and more degrees of
general. In situations with no invariant subspaces, such freedom are explored. One way to do this is to diagona-
simplifications do not occur and parallel transport must lize the 00 and w blocks of the Fock matrix. The
be explicitly accounted for explicitly. However, for the approximate Hessian can then be chosen to be diagonal
Grassmann case, setting U = 1 every iteration success- and equal to the difference in orbital eigenvalues {E} at
fully rotates the frame of reference so that the new the current iteration [8,9,23]:
tangent space in the new frame is identical to the old
Bia,jh = 2(Eu - Ei)6ijbah (22)
tangent space in the old frame, and parallel transport is
unnecessary. where i , j and a, b represent occupied and virtual orbital
One may now apply the BFGS update scheme [22] in indices, respectively. However, following the work of
+
which the approximate inverse Hessian for the (i 1)th Bacskay [l 1,121, we note that the optimal diagonal
iteration can be written as Hessian actually contains an energy shift,
BjU,jh= 2(fa - ~j + 6E)Sjj6,,. (23)
Within Bacskay's quadratic procedure, the value of this
shift is equal to the change in energy that results from a
Newton-Raphson step using the shifted Hessian. Since
where 6Gi+]is the change in the gradient vector from the clearly the energy that would be obtained after applica-
previous iteration and the intermediates tion of the Hessian clearly cannot be determined before
the Hessian has been constructed, it must be estimated
in practice. We find that the energy change from the
previous iteration is a reasonable estimator, and this
choice actually greatly improves the rate and stability
and
of convergence in the early iterations. In order to inter-
GBG = SGj+l 'Bi.bGi+I (211 face this approximate Hessian with the BFGS scheme,
we note that before any iterations have occurred the
have been used. The BFGS Hessian has two nice proper- BFGS Hessian is the unit matrix. Following [8], we
ties that make it well suited to minimization problems.
can transform to the set of coordinates where our diag-
First, it minimizes a quadratic potential with the mini-
onal Hessian (23) is also the unit matrix
mum number of gradient evaluations, and therefore is
expected to have superlinear convergence. Second, the
BFGS Hessian is positive definite. Hence, given the
choice between reducing the energy in one direction
and reducing the gradient in another, the BFGS pre- which we will call the energy-weighted coordinates
scription will tend to preferentially reduce the energy, (EWCs). The BFGS prescription (20) can then be
which clearly is desirable when dealing with a minimiza- applied in terms of these coordinates.
tion problem. To compute the EWCs (equation (24)) one must find
Storing the BFGS Hessian is not difficult, because one the orbitals that diagonalize the 00 and w blocks of the
needs only to compute it in the subspace spanned by Fock matrix: the pseudo-canonical orbitals. Since this
gradients and steps from previous iterations. This is does not affect the energy, the relevant transformation
easily accomplished by orthonormalizing the vectors can be written as a step in the horizontal space,
from previous iterations and expanding everything in
terms of these orthogonal basis vectors [8]. Since the
size of the subspace is not expected to depend strongly
C=CeH; H= (Hoe0 * ). Hvv
(25)
Geometric direct minimization 1717

It is easy to verify that taking a gradient step before


making the pseudo-canonical transform is not equiva-
lent to performing the same operations in the opposite
order,
C = CeCeH# CeHeC, (26)
and therefore we must be very careful about the order of
these two steps. In order to convert to the current set of
EWCs, we must pseudo-canonicalize first and then com-
pute the gradient; however, for vectors from previous
iterations clearly this is not possible. Thus, in order to
treat the previous steps and gradients on an equal
footing with the current gradient, we must determine
how the pseudo-canonical rotation affects vertical
tangent vectors from previous iterations. Specifically, Figure 1. Vertical step followed by a horizontal step can be
we desire the vector 0 that satisfies rewritten as the horizontal step followed by a transformed
vertical step.
eneQ= e V e H (27)
for an arbitrary vertical vector V (e.g. the gradient). a trajectory containing only vertical steps
Clearly 0 is the analogue of V in the current frame. It {~71,02,~3,.
. . .} that arrives at the same final set of
is readily verified that the correct solution is orbitals provided that the set of initial orbitals is trans-
formed by the sequence of horizontal transformations
= eHtV e H . (28)
{HI, Hz,H3,. . .}. From a geometric point of view, this
This is just a consequence of the standard transforma- trajectory of vertical steps has taken the 'straightest'
tion rules for a matrix under a unitary transformation of path to the current point while still remaining on the
the coordinates. Therefore, it is possible to update the surface. Since the straightest path also has the least
EWCs at every iteration so long as one transforms the curvature, convergence acceleration should be most effi-
gradients and steps from previous iterations using equa- cient along this path.
tions (28). Clearly, changing the EWCs at every iteration In summary, our algorithm for SCF orbital optimiza-
is beneficial, as it allows one always to use the 'best' tion looks like this:
approximate Hessian at the current point. Previous
algorithms [8, 9, 231 have been forced to retain the (1) Obtain an initial set of (orthogonal) orbital co-
EWCs from the initial set of orbitals. This is fine as efficients C o .
long as the orbitals do not change very much, but is a (2) Pseudo-canonicalize the orbitals. Save the trans-
serious limitation in general, because the quality of the formation matrix.
diagonal Hessian (23) will degrade as one moves away (3) Transport the gradients {Gi}and steps, {Si}
from the reference point. from previous iterations to the current set of
There is a simple geometric interpretation of this pseudo-canonical orbitals using equation (28).
transformation in terms of trajectories on our curved (4) Compute the gradient, Gi+] = dE/dV. It is easily
manifold, as depicted in cartoon form in figure 1. verified that Go, = F,,, where F is the pseudo-
During the first iteration, we begin with an initial set canonical Fock matrix.
of orbitals Co and follow a vertical step V to obtain a ( 5 ) Convert the gradients and steps into energy
new set of orbitals, C , . We then pseudo-canonicalize the weighted coordinates
orbitals by following a horizontal vector H. As is clear
in the figure, there is no vertical path from the initial
orbitals to the current orbitals, C i . However, one may
first apply the horizontal transform H to Co to obtain a
new set of initial orbitals, C& such that the initial energy
where B is the diagonal Hessian (equation (23)).
is unaffected. Then equation (28) allows us to determine
(6) Build the BFGS Hessian (equation (20)) and use
a new vertical step f. that connects these orbitals to the
it to compute the approximate Newton-Raphson
current set. By repeated application of this procedure,
step (equation (18)).
one can transform a sequence of alternating vertical and
(7) Convert the BFGS step Si+,into a rotation,
horizontal steps {VI,H1,VZ,H2,V3,H3,. . .} to obtain
1718 T. Van Voorhis and M. Head-Gordon

3.1. The G2 set


We have performed SCF calculations for the 56 mol-
ecules whose dissociation energies were utilized in the
(8) Update the orbitals using calibration of the G2 method [25]. The set contains a
variety of main group compounds which should provide
an unbiased picture of the performance of the different
convergence acceleration techniques. We chose the
(9) If convergence has not been achieved, return to 6-31 l++g** basis [26,27] and the generalized
step 2. Wolfsberg-Helmholtz (GWH) guess [28] for all
calculations and have considered both the Hartree-
We have found that this algorithm is extremely Fock and the B3LYP [29] methods.
robust. The only weakness is that, since BFGS gives A statistical summary of the results obtained for these
only an approximate step, there is the possibility that molecules is contained in table 1, from which it can be
the energy will go up on a given iteration. If this hap- seen that generally DIIS converges quite rapidly for
pens, it is useful to perform an approximate line search these systems, but fails to find a solution in two cases
along the previous step, making use of the information (B3LYP for the OH and CH radicals). GDM is quite
at the current and previous points. This takes care of the competitive with the standard theory, taking on average
possibility that the BFGS step was in the right direction only about 3 iterations longer to achieve convergence,
but just too long. If even this fails to lower the energy, it and is successful in converging the case where DIIS fails.
almost always indicates that the step, and therefore the Hence, for these main group systems GDM improves
BFGS matrix, is bad. Thus, if the line search fails, it is
the robustness of DIIS without significantly increasing
best to re-set the BFGS space, back up to the best pre-
the cost.
vious point and simply take a steepest descent step.
GDM demonstrates a tendency to find local minima,
Steps 1-9 together with the failsafe option of a line
whereas DIIS seems to find the global minimum more
search define the geometric direct minimization (GDM)
consistently. The reason for this is clear; in the formula-
algorithm for the solution of the SCF equations. It
tion of the current method we have rigorously enforced
should be mentioned that most of what has been devel-
the orthogonality constraint (4), and thus the search
oped here presumes there is one set of orbitals that is
desired, as in an RHF calculation. In the case of spin takes place only in the space of possible solutions. On
unrestricted orbitals, this is generalized trivially by the other hand, the DIIS extrapolated Fock matrix may
simply applying rotations to the alpha and beta orbitals not be derivable from a set of orthonormal orbitals;
separately. The restricted open shell case is more com- orthonormality is enforced because the new orbitals
plicated and has not been implemented, although in are chosen as the eigenvectors of the extrapolated
principle there is no obstacle to such an approach. Fock matrix. Thus, DIIS can temporarily leave the sur-
face of constraint in order to obtain the new extrapo-
lated orbitals. In cases where the surface is bumpy, this
3. Results allows one to ‘tunnel through’ barriers rather than
The GDM algorithm has been implemented in Ver- having to climb over them. Clearly this increases the
sion 2.0 of the ‘Q-Chem’ software package [24]. For ability of DIIS to escape from regions near a local mini-
comparison, we compare the rate and reliability of con- mum, and explains the observed tendency to arrive at
vergence for the current algorithm with the results of lower energy solutions.
Pulay’s DIIS procedure [ 1,2]. All closed shell molecules
used spin-restricted orbitals and all open shell species
were unrestricted. The iterations were deemed con- Table 1. Convergence statistics for the 56 molecules in the
verged if the energy change fell below 1 x lo-’’ Eh G2 test set [25].
and the RMS gradient fell below 1 x lop7 in less than
256 iterations. For the purposes of discussion we shall Average Maximum Local Did not
Method iterations iterations minima converge
refer to the lowest energy obtained by any of the algor-
ithms as the ‘global minimum’, although we cannot DIIS/HF 13.4 26 0 0
exclude the possibility that there might be a solution DIIS/B3LYP 11.8 17 0 2
still lower in energy that none of these approaches can GDM/HF 16.3 42 5 0
find. Finally, to present an equal comparison, the cases GDM/B3LYP 16.3 58 5 0
where DIIS fails to converge have been removed from DIIS-GDM/HF 13.8 24 0 0
the statistical analysis and will be discussed separately. DIIS-GDM/B3LYP 13.9 24 0 0
Geometric direct minimization 1719

The propensity of GDM to find local minima is ben- Table 2. Convergence statistics for the 27 first row transition
eficial in many cases. It often happens that the lowest metal complexes MCO', M(C0): and MCH:
energy solution for a correlated calculation such as MP2 Average Maximum Local Did not
or CCSD is not the same as for the H F solution. In these Method iterations iterations minima converge
cases, it is highly desirable to have a method that can be
induced to land consistently on this solution. Similarly, DIIS/HF 33.1 101 8 4
when one considers energies collected at several different DIIS/B3LYP 26.1 58 9 2
geometries, as might occur in a geometry optimization GDM/HF 88.6 216 13 1
GDM/B3LYP 17.5 170 7 0
or during a molecular dynamics simulation, it is usually
DIIS-GDM/HF 30.8 56 3 0
understood that one wishes to consider the same state at DIIS-GDM/B3LYP 31.8 104 3 0
all geometries, regardless of whether it is the global
minimum for the current structure. In both these
cases, GDM would seem to be the preferable alternative.
Sometimes, of course, convergence to the global mini- convergence of GDM is slowed for these molecules due
mum is exactly what is desired. In these cases, a robust to the presence of saddle points. With GDM, often one
approach can be formulated by performing DIIS extra- observes a rapid decrease in step size as a saddle point is
polation until the RMS Gradient is below, say 0.01. In approached, which means it takes very many small steps
this scheme, effectively DIIS acts as to pre-converge the to traverse a saddle point and continue with the mini-
orbitals that are input into the GDM procedure. We mization. It is not completely clear why DIIS has no
present the results for this approach under the heading such difficulty; in some cases the problem is moot
DIIS-GDM in table 1, and it is seen that the hybrid because DIIS simply converges to the saddle point,
approach retains the rapid convergence of both DIIS which clearly is undesirable, but in other cases, DIIS
and GDM for these cases, combining the robustness seems to avoid these problem areas.
of the GDM algorithm with the ability of DIIS to find For transition metal complexes, the robust conver-
the lowest energy solution. gence of GDM can be combined with DIIS's ability to
deal with saddle points using the hybrid DIIS-GDM
3.2. Transition metal complexes approach. As can be seen in table 2, DIIS-GDM con-
Transition metal systems present an interesting chal- verges at essentially the same rate as DIIS, and the
lenge for SCF convergence, since often there are extre- robustness of GDM is completely retained. Further, it
mely large numbers of low energy critical points that an is interesting to note that DIIS-GDM tends, on average,
algorithm must sort through in order to arrive at a suit- to converge to even lower energy solutions than either
able minimum. This makes it almost impossible for a DIIS or GDM individually. This would seem to result
convergence algorithm to consistently pick out the from the fact that running a few DIIS iterations effec-
global minimum from among the swarm of candidates. tively supplies GDM with an improved initial guess that
This can be aided by tailoring the initial guess to treat a lies within the basin of attraction of a different (and
battery of organometallic systems well [30], and so our more reliable) minimum.
task for these cases is focused on discovering an algor-
ithm that converges quickly and consistently to a mini- 3.3. DifJicult cases
mum, with the presumption that the convergence can be It is instructive to present in detail those cases where
shunted towards the global minimum by suitable adjust- DIIS fails to converge. The results of GDM and DIIS-
ment of the initial guess. GDM optimization for these systems are listed in table
To see how well GDM deals with the abundance of 3. These systems show the same general features that
critical points and near-degeneracies in these cases, we have been observed previously. GDM converges
have run calculations on the first-row transition metal rapidly, except for one case (B3LYP for ScCO')
carbonyl (MCO') and dicarbonyl (M(C0):) cations of where the procedure encounters a saddle point between
[311 and the first-row transition metal-methylene cations the initial guess and the final result. Running DIIS for
(MCH:) of [32]. Our calculations employed the 6-31g* the first several iterations pushes the energy below the
basis [33] and the GWH guess [29]. Since we make no offending saddle point and convergence is then much
attempt here to include relativistic corrections, we use more rapid. In other cases, the rate of convergence of
the non-relativistic optimized geometries [31,321. the hybrid procedure is comparable with GDM. These
A statistical summary of the convergence rates is pre- systems illustrate the tendency of DIIS-GDM to find
sented in table 2, showing that GDM is significantly lower energy solutions than GDM alone, as the hybrid
slower than DIIS for these cases, although it still suc- approach finds a lower solution for five of the seven
ceeds in converging in all cases where D I N fails. The cases. It is important to recognize that many of these
1720 T. Van Voorhis and M. Head-Gordon

Table 3. Converges information for molecules where DIIS one has multiple invariant subspaces. In the case of
failed to converge (energies in Eh). ROHF, one has doubly occupied, singly occupied and
GDM DIIS-GDM unoccupied subspaces that are invariant to rotations
within themselves. In the case of CASSCF [19], one
Molecule Iterations Energy Iterations Energy has inactive occupied (o), inactive virtual (v) and
CH/B3LYP 22 -38.4941 23 -38.4941 active (a) subspaces. Considering the CASSCF case,
OH/B3LYP 27 -75.7624 22 - 75.7624 the logical generalization of updating the orbitals by
NiCH$/HF 54 - 1545.2996 33 - 1545.2996 sequential ov rotations,
NiCH$/B3LYP 29 - 1547.0374 33 -1547.0561
COCO+/HF 31 -1493.6052 52 - 1493.6643
c,,, = Ce’ov, (32)
NiCO+/HF 34 -1619.0480 49 - 1619.0866 is to perform a sequence of three rotations at each step
ScCO+/B3LYP 101 -873.6381 61 -873.6831
Fe(CO)l/HF 32 -1487.3737 58 - 1487.4712 c,,, = C e A ~ ~ e A ~ a e A ~ ~ ,(33)
where the order of the rotations is arbitrary but should
be the same at each iteration. Each of these three rota-
tion classes could then be extrapolated sequentially
cases could also have been converged by using standard
using manipulations that are completely analogous to
‘tricks of the trade’ in conjunction with DIIS, for
those presented here. For example, after the ov rotation
example, by employing a level shift or damping the
has been performed, the vectors from previous iterations
DIIS iterations. However, we think the GDM approach
could be translated to the new coordinate frame using
is prefereable to these techniques because it presents a
unified solution to these problems. 3 = eAL”v,Aov, (34)
The BFGS and approximate orbital Hessians could then
4. Discussion be applied in exactly the same way, as has been done in
We have presented no estimates of the cost of this [9]. The ability to perform an efficient search without
approach relative to DIIS. GDM is designed for the having to resort to explicit caIculation of the Hessian
intermediate size regime where the cost is dominated is extremely important for optimized orbital coupled
by Fock builds rather than matrix manipulations, and cluster methods [18,7,6] where, as for SCF, the forma-
therefore such considerations are not relevant. This tion and inversion of the full Hessian is expensive.
covers the vast majority of computations done today.
For extremely large molecules, linear scaling methods
are a necessity, and density matrix-based treatments 5. Conclusions
are expected to be much more efficient due to the expo- In this work we have used basic principles of differ-
nential decay of the density for non-conducting systems. ential geometry to formulate a new approach to energy
Much of the present algorithm could be rephrased minimization for methods that depend on a set of ortho-
readily in terms of the effects of the unitary transforma- normal orbitals. The resulting approach, GDM, is com-
tion on the one-particle density matrix rather than the petitive with DIIS for many systems and capable of
orbitals. Then, the locality of the density matrix in the converging many ‘problem cases’ where DIIS fails to
atomic orbital basis could be exploited to perform find a solution. It is satisfying that this can be achieved
matrix multiplications in a linear scaling fashion [34]. by a rigorous geometric argument that does not resort to
The only obstacle to this approach is that the orbital heuristic damping factors or level shifts.
Hessian (23) requires diagonalization of the 00 and w For systems where DIIS converges, GDM shows a
blocks of the Fock matrix, and there is no well defined mild tendency to converge to higher energy solutions
prescription for diagonalizing sparse matrices in linear than DIIS. This happens primarily when the initial
time. If an alternative approximate Hessian could be guess is far from the global minimum, and it is argued
obtained that did not require pseudo-canonical orbitals, that this is a desirable characteristic in many situations.
the current algorithm could be translated very readily However, in case the GDM solution is not acceptable,
into a linear scaling density matrix-based scheme, very we also present a hybrid DIIS-GDM approach that reg-
similar in s p i r i t k t w o f [34]. ularly gives energies that are as low as the DIIS solution
There is one significant alteration that would need to but that retains the robustness of GDM. The hybrid
be made in order to apply this approach to ROHF procedure also tends to accelerate convergence signifi-
optimization or to active space correlation methods cantly in cases where GDM is slow to converge.
like CASSCF [19]. In both of these cases, instead of It is clear that the source of the robustness of these
having two invariant subspaces (occupied and virtual) procedures is the adherence to the direct minimization
Geometric direct minimization 1721

strategy, which requires the energy to go down at every [I31 SANO,T., and I’HAYA,Y. J., 1991, J. chem. Phys., 95,
iteration. Whereas an extrapolation technique such as 6607.
[14] EDELMAN, A,, ARIAS,T. A,, and SMITH,S., 1998, SZAM
DIIS can oscillate between two fixed points, this is not J . Matrix Anal. Applic., 20, 303.
allowed in a direct minimization, since one of the two [15] ROOTHAAN, C . C. J., 1951, Rev. Mod. Phys., 23, 69.
points must be higher in energy and thus one half of the [16] BOBROWICZ, F. B., and GODDARD, W. A., 1977, Methods
oscillation must involve an uphill step. of Electronic Structure Theory, Vol. 3, edited by H. F.
The GDM strategy has many potential applications Schaefer I11 (New York: Plenum Press).
[17] KRIEGER,J. B., LI, Y., and IAFRATE, G. J., 1992, Phys.
due to the prevalence of the orbital optimization prob- Rev. A, 46, 5453.
lem in electronic structure theory. One interesting direc- [18] SCUSERIA, G. E., and SCHAEFER, H. F., 1987, Chem. Phys.
tion involves the simultaneous updating of nuclear and Lett., 142, 354.
electronic degrees of freedom [8,23], which would have [19] Roos, B. O., TAYLOR, P. R., and SIEGBAHN, P. E. M.,
clear relevance for Car-Parrinello molecular dynamics 1980, Chem. Phys., 48, 157.
[20] DOCARMO, M. P., 1976, Differential Geometry of Curves
[35], where the size of the timestep is determined mainly and Surfaces (Prentice-Hall).
by the radius of reliable extrapolation of the Kohn- [21] HUTTER,J., PARRINELLO, M., and VOGEL,S., 1994, J.
Sham orbitals along the trajectory. Since the current chem. Phys., 101, 3862.
algorithm is simply the geometric generalization of a [22] The BFGS method is discussed, e.g. in the popular
trajectory in the presence of an orthogonality constraint, Numerical Recipes books available at www.nr.com.
[23] FISCHER, T. H., and AMLOF,J., 1992, J. phys. Chem.. 96,
it stands to reason that the success of this method for 9768.
single-point problems could be extended readily to deal [24] KONG,J., WHITE,C. A., KRYLOV, A. I., SHERRILL, C.
with molecular dynamics simulations. D), ADAMSON, R. D., FURLANI, T. R., LEE,M. S., LEE,
Another interesting application of these principles A. M., GWATLNEY, S. R.,ADAMS,T. R.,DASCHEL, H.,
would be to test similar approaches for active space ZHANG,W., KORAMBATH, P. P), OCHSENFELD, C.,
GILBEN,A. T. B., KEDZIORA, G. S., MAURICE, D. R.,
correlation models [6, 191 that involve only a minor NAIR, N., SHAO,Y., BESLEY,N. A., MASLEN,P. E.,
extension of the formulae presented here. Previous DOMBROSKI, J. P., BAKER,J., BYRD, E. F. C., VAN
work [9] suggests that this avenue should be fruitful. VOORHIS,T., OUMI, M., HIRATA,S., Hsu, C.-P.,
ISHIKAWA, N., FLORIAN, J., WARSHEL, A., JOHNSON, B.
This research was supported by a grant from the G., GILL,P. M. W., HEAD-GORDON, M., and POPLE,J.
A., 2000, Q-Chem 2.0: A high performance ab initio elec-
National Science Foundation (CHE-998 1997). tronic structure program package; J. comput. Chem., 21,
1532.
References [25] CURTISS, L. A., et al., 1991, J . chem. Phys., 94, 7221.
[l] PULAY, P., 1980, Chem. Phys. Lett., 73, 393. [26] KRISHNAN, R., BINKLEY, J. S., SEEGER, R.,and POPLE,J.
[2] PULAY, P., 1982, J. Cornput. Chem., 3, 556. A,, 1980, J. chem. Phys., 72, 650.
[3] HAMILTON, T. P., and PULAY,P., 1986, J. chem. Phys., [27] CLARK,T., CHANDRASEKHAR, J., and SCHLEYER, P. v.
84, 5728. R., 1983, J . Cornput. Chem., 4, 294.
[4] VAN LENTHE,J. H., VERVEEK, J., and PULAY,P., 1991, [28] DUPUIS,M., and KING, H. F., 1977, I d . J. Quantum
Molec. Phys., 73, 1159. Chem., 11, 613.
[5] MULLER, R. P., et al., 1994, J. chem. Phys., 100, 1226. [29] BECKE,A. D., 1993, J. chern. Phys., 98, 5648.
[6] KRYLOV, A. I., SHERRILL, C. D., BYRD,E. F. C., HEAD- [30] VACEK,G., PERRY,J. K., and LANGLOIS, J.-M., 1999,
GORDON,M., 1998, J . chem. Phys., 109, 10669. Chem. Phys. Lett., 310, 189.
[7] SHERRILL, C. D., KRYLOV, A. I., BYRD,E. F. C., HEAD- [31] BARNES. L. A., ROSI,M., and BAUSLICHER, JR., C. W.,
GORDON,M., 1998, J. chem. Phys., 109, 4171. 1990, J. chem. Phys., 93, 609.
[8] HEAD-GORDON, M., and POPLE,J. A., 1988, J. phys. [32] BAUSLICHER, Jr., C. W., et al., 1992, J . phys. Chem., 96,
Chem., 92, 3063. 6969.
[9] CHABAN,G., SCHMIDT,M. W., and GORDON,M. S., [33] HARIHARAN, P. C., and POPLE,J. A,, 1973, Theoret.
1997, Theoret. Chim. Acta, 97, 88. Chim. Acta, 28, 213.
[lo] CHANC~S, E., and LE BRIS, C., 2000, Intl. J. Quantum [34] HELGAKER, T., LARSEN, H., OLSEN,J., and JBRGENSEN,
Chem., 79, 82. P., 2000, Chem. Phys. Lett., 327, 397.
1111 BACSKAY, G. B., 1981, Chem. Phys., 61, 385. [35] CAR,R., and PARRINELLO, M., 1985, Phys. Rev. Lett., 55,
[12] BACSKAY, G. B., 1982, Chem. Phys., 65, 383. 2471.

You might also like