0% found this document useful (0 votes)

9 views11 pages

Bfgs

The document discusses the global convergence of the BFGS quasi-Newton minimization algorithm for nonsmooth convex functions, building on Powell's 1976 results for smooth functions. It establishes that under certain conditions, the BFGS method can effectively minimize nonsmooth functions, providing examples and theoretical support for its application. The authors present a modified theorem that relaxes the smoothness requirement while strengthening convexity assumptions, demonstrating the algorithm's robustness in practical scenarios.

Uploaded by

Guifré Sánchez Serra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views11 pages

Bfgs

Uploaded by

Guifré Sánchez Serra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

BFGS convergence to nonsmooth minimizers of

convex functions
arXiv:1703.06690v1 [math.OC] 20 Mar 2017

∗ †
J. Guo A.S. Lewis
March 21, 2017

Abstract
The popular BFGS quasi-Newton minimization algorithm under reason-
able conditions converges globally on smooth convex functions. This result
was proved by Powell in 1976: we consider its implications for functions that
are not smooth. In particular, an analogous convergence result holds for func-
tions, like the Euclidean norm, that are nonsmooth at the minimizer.

Key words: convex; BFGS; quasi-Newton; nonsmooth.

AMS 2000 Subject Classification: 90C30; 65K05.

1 Introduction
The BFGS (Broyden-Fletcher-Goldfarb-Shanno) method for minimizing a smooth
function has been popular for decades [6]. Surprisingly, however, it can also be
an effective general-purpose tool for nonsmooth optimization [3]. For twice con-
tinuously differentiable convex functions with compact level sets, Powell [7] proved
global convergence of the algorithm in 1976. By contrast, in the nonsmooth case, de-
spite substantial computational experience, the method is supported by little theory.
Beyond one dimension, with the exception of some contrived model examples [4],
the only previous convergence proof for the standard BFGS algorithm applied to
a nonsmooth function seems to be the analysis of the two-dimensional Euclidean
norm in [3].
As a simple illustration, consider the nonsmooth convex function f : R2 → R
defined by f (u, v) = u2 + |v|. A routine implementation of the BFGS method, using
∗
ORIE, Cornell University, Ithaca, NY 14853, U.S.A. [email protected].
†
ORIE, Cornell University, Ithaca, NY 14853, U.S.A. people.orie.cornell.edu/aslewis.
Research supported in part by National Science Foundation Grant DMS-1613996.

1
Figure 1: BFGS method for f (u, v) = u2 + |v|. A thousand random starts, using
inexact line search, and initial approximate Hessian I. Semilog plots of function
value f (uk , vk ), initially normalized. Panel 1: against iteration count k. (Bold line
plots 2−2k .) Panel 2: against function evaluation count, including line search.

a random initial point and a standard backtracking line search, invariably converges
to the unique optimizer at zero. Not surprisingly, the method of steepest descent,
using the same line search, often converges to a nonoptimal point (u, 0) with u 6= 0.
For example, Figure 1 plots function values for a thousand runs of BFGS against
both iteration count and a count of the number of function-gradient evaluations,
including those incurred in each line search. (Precisely, the initial Hessian approxi-
mation is the identity, the weak Wolfe line search uses Armijo parameter 10−4 and
Wolfe parameter 0.9, and the initial function value is normalized to one.) The re-
sults compellingly support convergence, and indeed suggest a linear rate: the bold
line overlaid on the first panel corresponds to the BFGS iterates (2−k , 52 (−1)k 2−2k )
generated by an exact line search [4]. However, even for this very simple example,
a general convergence result does not seem easy.
Nonetheless, Powell’s theory does have consequences even in the nonsmooth case.
Loosely speaking, we prove, at least under a strict-convexity-like assumption, that
global convergence can only fail for the BFGS method if a subsequence of the iterates
converges to a nonsmooth point. For example, for the function f (u, v) = u2 + |v|,
BFGS iterates cannot remain a uniform distance away from the line v = 0. While
intuitive — a successful smooth algorithm should somehow detect nonsmoothness
— this result is also reassuring, and in fact suffices to prove convergence on some
interesting examples. An analogous technique proves convergence for the Euclidean

2
norm on Rn , generalizing the result for n = 2 in [3].

2 BFGS sequences
Given a set U ⊂ Rn , we consider the BFGS method for minimizing a possibly
nonsmooth function f : U → R. We call a sequence (xk ) in U “BFGS” if the BFGS
method could generate it using a line search satisfying the Armijo and weak Wolfe
conditions. More precisely, we make the following definition.

Definition 2.1 A sequence (xk ) is a BFGS sequence for the function f if f is differ-
entiable at each iterate xk with nonzero gradient ∇f (xk ), and there exist parameters
µ < ν in the interval (0, 1) and an n-by-n positive definite matrix H0 such that the
vectors
sk = xk+1 − xk and yk = ∇f (xk+1 ) − ∇f (xk )
and the matrices defined recursively by

sk ykT T sk sTk
(2.2) Vk = I − and Hk+1 = V H V
k k k +
sTk yk sTk yk

satisfy

(2.3) Hk ∇f (xk ) ∈ −R+ sk

(2.4) f (xk+1 ) ≤ f (xk ) + µ∇f (xk )T sk
(2.5) ∇f (xk+1 )T sk ≥ ν∇f (xk )T sk

for k = 0, 1, 2, . . ..

Notice that this property is independent of any particular line search algorithm
used to generate the sequence (xk ): it depends only on the sequences of functions
values f (xk ) and gradients ∇f (xk ). Conceptually, in the definition, the matrices
Hk are approximate inverse Hessians for the function f at the iterate xk : the equa-
tions (2.2) define the BFGS quasi-Newton update and the inclusion (2.3) expresses
the fact that the step sk is in the corresponding approximate Newton direction.
The inequalities (2.4) and (2.5) are the Armijo and weak Wolf line search conditions
respectively, with parameters µ and ν respectively. By a simple and standard induc-
tion argument, they imply that the property sTk yk > 0 then holds for all k, ensuring
the matrices Hk are well-defined and positive definite, and that the function values
f (xk ) decrease strictly. An implementation of the BFGS method for a convex func-
tion f using a standard backtracking line search will generate a BFGS sequence of
iterates, assuming that those iterates stay in the set U and that the method never
encounters a nonsmooth or critical point.

3
Example: a simple nonsmooth function
Consider the function f : R2 → R defined by f (u, v) = u2 + |v|. (We abuse notation
slightly and identify the vector [u v]T ∈ R2 with the point (u, v).) Then the sequence
in R2 defined by 2
2−k , (−1)k 2−2k (k = 0, 1, 2, . . .)
5
is a BFGS sequence, as observed in [4, Prop 3.2]. Specifically, if we define a matrix
1
0
H0 = 4 1 ,
0 2
then the the definition of a BFGS sequence holds for any parameter values µ ∈ (0, 0.7]
and ν ∈ (µ, 1). In this example, the “exact” line search property ∇f (xk+1 )T sk = 0
holds for all k, and the approximate inverse Hessians are
1
2
0 1 5 (−1)k 21−k
H1 = , Hk = (k > 1).
0 14 6 (−1)k 21−k 23−2k

Example: the Euclidean norm

Consider the function f = k · k on R2 . Beginning with the initial vector [1 0]T ,
generate a sequence of vectors by, at each iteration, rotating clockwise through an
angle of π3 and shrinking by a factor 21 . The result is a BFGS sequence for f , as
observed in [3]. Specifically, if we define a matrix
√
3
√ − 3
H0 = ,
− 3 3
then the the definition of a BFGS sequence holds for any parameter values µ ∈ 0, 23

and any ν ∈ (µ, 1). Again, the exact line search property ∇f (xk+1 )T sk = 0 holds
for all k. In this case the approximate
√ inverse Hessians have eigenvalues behaving
−k
asymptotically like 2 (3 ± 3) (see [3]).

3 Main result
The following theorem captures a key global convergence property of the BFGS
method.
Theorem 3.1 (Powell, 1976) Consider an open convex set U ⊂ Rn containing a
BFGS sequence (xk ) for a convex function f : U → R. Assume that the level set
{x ∈ U : f (x) ≤ f (x0 )} is bounded, and that
(3.2) ∇2 f is continuous throughout U .
Then the sequence of function values f (xk ) converges to min f .

4
Among the assumptions in Powell’s theorem, at least for dimension n > 2
(see [8]), convexity is central. Although the BFGS method works well in prac-
tice on general smooth functions [6], nonconvex counterexamples are known where
convergence fails: in particular, [1] presents a bounded but nonconvergent BFGS
sequence for a polynomial f : R4 → R. In the general convex case, on the other
hand, whether the smoothness assumption (3.2) can be weakened seems unclear.
We present here a result analogous to Powell’s theorem. We modify the as-
sumptions, strengthening the convexity assumption but weakening the smoothness
requirement (3.2). Similar results to the one below hold for many common minimiza-
tion algorithms possessing suitable global convergence properties in the smooth case.
Such algorithms generate sequences of iterates xk characterized by certain properties
of the function values f (xk ) and gradients ∇f (xk ) (for k = 0, 1, 2, . . .), analogous
to the definition of a BFGS sequence. Providing the algorithm generates function
values f (xk ) that must decrease to the minimum value min f for any convex function
whose level sets are bounded and whose Hessian is continuous and positive definite
throughout those level sets, exactly the same proof technique applies. Examples of
such algorithms include standard versions of steepest descent [6], coordinate descent
(see for example [5]), and conjugate gradient methods (see for example [2]). Here
we concentrate on BFGS because, in striking contrast to these methods, the BFGS
method works well in practice on nonsmooth functions [3].

Theorem 3.3 Powell’s Theorem also holds with the smoothness assumption (3.2)
replaced by the following assumption:
 2
 ∇ f is positive-definite and continuous throughout
(3.4) an open set V ⊂ U containing the set cl(xk ) and
satisfying inf V f = min f .


Proof We consider an open convex set U ⊂ Rn containing a BFGS sequence (xk )

for a convex function f : U → R satisfying assumption (3.4). We further assume
that the level set {x ∈ U : f (x) ≤ f (x0 )} is bounded, and our aim is to prove that
the sequence of function values f (xk ) converges to min f .
Assume first that the theorem is true in the special case when U = Rn and
the complement V c is bounded. We then deduce the general case as follows. Note
by assumption, that the function f is not constant, so by convexity there exists a
point x̄ ∈ U with f (x̄) > f (x0 ). Convexity also ensures that f is L-Lipschitz on the
nonempty compact convex set
K = {x ∈ U : f (x) ≤ f (x̄)},
for some constant L > 0. Hence there exists a convex Lipschitz function fˆ: Rn → R
agreeing with f on K, specifically the Lipschitz regularization defined by
fˆ(y) = min{f (x) + Lky − xk} (y ∈ Rn ).
x∈K

5
Now, for any sufficiently large β ∈ R, the convex function f˜: Rn → R defined by

˜
n
ˆ 1 2
o
f (x) = max f (x), kxk − β (x ∈ Rn )
2

also agrees with f on K. The Hessian of f˜ is just the identity throughout the open
set n
ˆ 1 2
o
W = x : f (x) < kxk − β .
2
Furthermore, this set has bounded complement, and therefore so does the open set

Ṽ = W ∪ {x ∈ V : f (x) < f (x̄)}.

Now notice that (xk ) is also a BFGS sequence for the function f˜, and all the as-
sumptions of the theorem hold with f replaced by f˜, U replaced by Rn , and V
replaced Ṽ . Applying the special case of the theorem, we deduce

f (xk ) = f˜(xk ) → min f˜ = min f,

as required.
We can therefore concentrate on the special case when U = Rn and the set
N = V c is compact. We can assume N is nonempty, since otherwise the result follows
immediately from Powell’s Theorem. The convex function f is then continuous
throughout Rn . It is not constant, and hence is unbounded above. Furthermore, by
assumption, the initial point x0 is not a minimizer, so all the level sets {x : f (x) ≤ α}
are compact. Since N is compact and f is continuous, we can fix a constant α >
f (x0 ) satisfying α > maxN f .
Since the values f (xk ) are decreasing, the sequence (xk ) is bounded and hence
the closure cl(xk ) is compact. For all sufficiently small > 0, we then have

cl(xk ) ∩ (N + 2B) = ∅ and max f < α,

N +2B

where B denotes the closed unit ball in Rn . The distance function dN : Rn → R

defined by dN (x) = minN k · −xk (for x ∈ Rn ) is continuous, so the set

Ω = {x : dN (x) ≥ 2 and f (x) ≤ α}

is compact, and is contained in the open set {x : dN (x) > }. On this open set, the
function f is convex, in the sense of [11], and C (2) with positive-definite Hessian.
Hence, by [11, Theorem 3.2], there exists a C (2) convex function f on a convex open
neighborhood U of the convex hull conv Ω agreeing with f on Ω . Our choice of
ensures
{x : f (x) = α} ⊂ Ω ⊂ {x : f (x) ≤ α},

6
so in fact conv Ω = {x : f (x) ≤ α}. (Although superfluous for this proof,
[11, Theorem 3.2] even guarantees that f has positive-definite Hessian on this com-
pact convex set, and hence is strongly convex on it.)
We next observe that the level set {x ∈ U : f (x) ≤ f (x0 )} is bounded, since it
is contained in the set {x : f (x) ≤ α}. Otherwise there would exist a point x ∈ U
satisfying f (x) ≤ f (x0 ) = f (x0 ) < α and f (x) > α. By continuity of f , there exists
a point y on the line segment between x0 and x satisfying f (y) = α. But then we
must have y ∈ Ω and hence f (y) = f (y) = α, contradicting the convexity of f .
The values and gradients of the functions f and f : U → R agree at each iterate
xk , so since those iterates comprise a BFGS sequence for f , they also do so for f .
We can therefore apply Theorem 3.1 to deduce

f (xk ) = f (xk ) ↓ min f as k → ∞.

By assumption, there exists a sequence of points xr ∈ V (for r = 1, 2, 3, . . .) sat-

isfying limr f (xr ) = min f . For any fixed index r, we know xr ∈ Ω for all > 0
sufficiently small, so we have

min f ≤ lim f (xk ) = min f ≤ f (xr ) = f (xr ).

Taking the limit as r → ∞ shows limk f (xk ) = min f , as required. 2

The following consequence suggests simple examples.

Corollary 3.5 Powell’s Theorem also holds with smoothness assumption (3.2) re-
placed by the assumption that ∇2 f is positive-definite and continuous throughout the
set {x ∈ U : f (x) > min f }.

Proof Suppose the result fails. The given set, which we denote V must contain
the set cl(xk ): otherwise there would exist a subsequence of (xk ) converging to
a minimizer of f , and since the values f (xk ) decrease monotonically, they would
converge to min f , a contradiction. Clearly we have inf V f = min f . But now
applying Theorem 3.3 gives a contradiction. 2

Corollary 3.6 Consider an open semi-algebraic convex set U ⊂ Rn containing

a BFGS sequence for a semi-algebraic strongly convex function f : U → R with
bounded level sets. Assume that the sequence and all its limit points lie in the
interior of the set where f is twice differentiable. Then the sequence of function
values converges to the minimum value of f .

Proof Denote the interior of the set where f is twice differentiable by V . Stan-
dard results in semi-algebraic geometry [10, p. 502] guarantee that V is dense in

7
U , whence inf V f = min f , and furthermore that the Hessian ∇2 f is continuous
throughout V , and hence positive-definite by strong convexity. The result now fol-
lows by Theorem 3.3. 2

The open set V in the proof of Corollary 3.6, where the function f is smooth,
has full measure in the underlying set U . Hence, if we initialize the algorithm in
question with a starting point x0 generated at random from a continuous probability
distribution on U , and use a computationally realistic line search to generate each
iterate xk from its predecessor, then we would expect (xk ) ⊂ V almost surely. Then,
according to the result, one (or both) of two cases hold.

(i) The algorithm succeeds: f (xk ) → min f .

(ii) A subsequence of the iterates converges to a point where f is nonsmooth.

Extensive computational experiments suggest case (i) holds almost surely [3].
Like Theorem 3.3, analogous versions of Corollary 3.6 hold for many other algo-
rithms, in addition to the BFGS method. By contrast with BFGS, however, those
algorithms often fail in general, due to the possibility of case (ii). In the special
situation described in Corollary 3.5, case (ii) implies case (i), so analogous results
will hold for many common algorithms, like steepest descent, coordinate descent, or
conjugate gradients.

4 Special constructions
Unlike Powell’s original result, Theorem 3.3 requires the Hessian ∇2 f to be positive-
definite on an appropriate set, an assumption that fails for some simple but interest-
ing examples like the Euclidean norm. We can sometimes circumvent this difficulty
by a more direct construction, avoiding tools from [11]. The following result is a
version of Corollary 3.5 under a more complicated but weaker assumption.

Theorem 4.1 Powell’s Theorem also holds with the smoothness assumption (3.2)
replaced by the following weaker condition:

For all constants δ > 0, there is a convex open neighborhood Uδ ⊂ U of

the set {x ∈ U : f (x) ≤ f (x0 )}, and a C (2) convex function fδ : Uδ → R
satisfying fδ (x) = f (x) whenever f (x0 ) ≥ f (x) ≥ min f + δ.

Proof Clearly condition (3.2) implies the given condition, since we could choose
Uδ = U and fδ = f . Assuming this new condition instead, suppose the con-
clusion of Powell’s Theorem 3.1 fails, so there exists a number δ > 0 such that
f (xk ) > min f + 2δ for all k = 0, 1, 2, . . .. Consider the function fδ guaranteed

8
by our assumption. Since f is continuous, there exists a point x̄ ∈ U satisfying
f (x̄) = min f + δ, and since fδ (x̄) = f (x̄), we deduce min fδ ≤ min f + δ.
Since (xk ) is a BFGS sequence for the function f , it is also a BFGS sequence for
the function fδ . Applying Theorem 3.1 with f replaced by fδ shows the contradiction

min f + 2δ ≤ f (xk ) = fδ (xk ) ↓ min fδ ≤ min f + δ,

so the result follows. 2

We can apply this result directly to the Euclidean norm.

Corollary 4.2 Any BFGS sequence for the Euclidean norm on Rn converges to
zero.

Proof For any δ > 0, consider the function gδ : R → R defined by

 δ3 +3δt2 −|t|3
 3δ 2
(|t| ≤ δ)
(4.3) gδ (t) =
|t| (|t| ≥ δ).


This function is C (2) convex and symmetric. The function fδ : Rn → R defined

by fδ (x) = gδ (kxk) is also C (2) convex, either as a consequence of [9] or via a
straightforward direct calculation. The result now follows from Theorem 4.1. 2

Analogously, the following result is a more direct version of Theorem 3.3.

Theorem 4.4 Powell’s Theorem also holds with the smoothness assumption (3.2)
replaced by the assumption that some open set V ⊂ U containing the set cl(xk ) and
satisfying inf V f = min f also satisfies the following condition:
For all constants δ > 0, there is a convex open neighborhood Uδ ⊂ U of
the set {x ∈ U : f (x) ≤ f (x0 )}, and a C (2) convex function fδ : Uδ → R
satisfying fδ (x) = f (x) for all points x ∈ Uδ such that dV c (x) > δ.

Proof Denote the distance between the compact set cl(xk ) and the closed set V c
by δ̄, so we know δ̄ > 0. For any constant δ ∈ (0, δ̄), we have dV c (xk ) > δ for all
indices k = 0, 1, 2, . . ., and hence fδ (xk ) = f (xk ).
The values and gradients of the functions f and fδ agree at each iterate xk , so
since those iterates comprise a BFGS sequence for f , they also do so for fδ . We can
therefore apply Theorem 3.1 to deduce

f (xk ) = fδ (xk ) ↓ min fδ as k → ∞.

By assumption, there exists a sequence of points xr ∈ V (for r = 1, 2, 3, . . .) sat-

isfying limr f (xr ) = min f . For any fixed index r, we know dV c (xr ) > δ for all

9
sufficiently small δ > 0, so since fδ (xr ) = f (xr ), we deduce min fδ ≤ f (xr ). The
inequality limk f (xk ) ≤ f (xr ) follows, and letting r → ∞ proves limk f (xk ) = min f
as required. 2

We end by proving a claim from the introduction.

Corollary 4.5 Any BFGS sequence for the function f : R2 → R given by f (u, v) =
u2 + |v| has a subsequence converging to a point on the line v = 0.

Proof Suppose the result fails, so some BFGS sequence (uk , vk ) has its closure
contained in the open set

V = {(u, v) ∈ R2 : v 6= 0}.

Clearly we have inf V f = min f . For any constant δ > 0, define a function fδ : R2 →
R by f (u, v) = u2 + gδ (v), where the function gδ is given by equation (4.3). Then
we have f (u, v) = fδ (u, v) for any point (u, v) satisfying |v| > δ, or equivalently
dV c (u, v) > δ. Hence the assumptions of Theorem 4.4 hold (using the set Uδ =
R2 ), so we deduce f (uk , vk ) → 0, and hence (uk , vk ) → (0, 0). This contradiction
completes the proof. 2

As we remarked in the introduction, numerical evidence strongly supports a

conjecture that all BFGS sequences for the function f (u, v) = u2 + |v| converge to
zero. That conjecture remains open.

References
[1] Y.-H. Dai. A perfect example for the BFGS method. Mathematical Program-
ming, 138(1):501–530, 2013.

[2] J.C. Gilbert and J. Nocedal. Global convergence properties of conjugate gra-
dient methods for optimization. SIAM J. Optim., 2(1):21–42, 1992.

[3] A.S. Lewis and M.L. Overton. Nonsmooth optimization via quasi-Newton meth-
ods. Math. Program., 141(1-2, Ser. A):135–163, 2013.

[4] A.S. Lewis and S. Zhang. Nonsmoothness and a variable metric method. J.
Optim. Theory Appl., 165(1):151–171, 2015.

[5] Z.Q. Luo and P. Tseng. On the convergence of the coordinate descent method
for convex differentiable minimization. J. Optim. Theory Appl., 72(1):7–35,
1992.

10
[6] J. Nocedal and S.J. Wright. Numerical Optimization. Springer Series in Opera-
tions Research and Financial Engineering. Springer, New York, second edition,
2006.

[7] M.J.D. Powell. Some global convergence properties of a variable metric algo-
rithm for minimization without exact line searches. In Nonlinear Programming
(Proc. Sympos., New York, 1975), pages 53–72. SIAM–AMS Proc., Vol. IX.
Amer. Math. Soc., Providence, R. I., 1976.

[8] M.J.D. Powell. On the convergence of the DFP algorithm for unconstrained
optimization when there are only two variables. Math. Program., 87(2, Ser.
B):281–301, 2000. Studies in algorithmic optimization.

[9] H.S. Sendov. Nonsmooth analysis of Lorentz invariant functions. SIAM J.

Optim., 18(3):1106–1127, 2007.

[10] L. van den Dries and C. Miller. Geometric categories and o-minimal structures.
Duke Math. J., 84(2):497–540, 1996.

[11] M. Yan. Extension of convex function. J. Convex Anal., 21(4):965–987, 2014.

Meshfree Approximation With M: Lecture VI: Nonlinear Problems: Nash Iteration and Implicit Smoothing
No ratings yet
Meshfree Approximation With M: Lecture VI: Nonlinear Problems: Nash Iteration and Implicit Smoothing
77 pages
Unconstrained Numerical Optimization An Introduction For Econometricians
100% (1)
Unconstrained Numerical Optimization An Introduction For Econometricians
32 pages
Optimization Theory with Applications
From Everand
Optimization Theory with Applications
Donald A. Pierre
4/5 (4)
J. H. Wells, L. R. Williams Auth. Embeddings and Extensions in Analysis PDF
100% (1)
J. H. Wells, L. R. Williams Auth. Embeddings and Extensions in Analysis PDF
116 pages
Benson 1
No ratings yet
Benson 1
36 pages
A Relaxed Inertial Forward-Backward-Forward Algorithm For Solving Monotone Inclusions With Application To Gans
No ratings yet
A Relaxed Inertial Forward-Backward-Forward Algorithm For Solving Monotone Inclusions With Application To Gans
37 pages
Kelley C.T. - Iterative Methods For optimization-SIAM (1999)
No ratings yet
Kelley C.T. - Iterative Methods For optimization-SIAM (1999)
188 pages
Doan BFGS
No ratings yet
Doan BFGS
72 pages
04 Nonlinear Systems and Optimization
No ratings yet
04 Nonlinear Systems and Optimization
74 pages
Introduction to Electromagnetic Theory
From Everand
Introduction to Electromagnetic Theory
George E. Owen
No ratings yet
Elements of The Traditional Music of Thailand
80% (5)
Elements of The Traditional Music of Thailand
8 pages
Final GR 7 Tech Term 3 Task 5 Memo
No ratings yet
Final GR 7 Tech Term 3 Task 5 Memo
4 pages
Hintermüller M. Semismooth Newton Methods and Applications
No ratings yet
Hintermüller M. Semismooth Newton Methods and Applications
72 pages
Opt Sem10
No ratings yet
Opt Sem10
26 pages
A Limited-Memory Algorithm For
No ratings yet
A Limited-Memory Algorithm For
22 pages
TNPSC Group 2 Mains Preparation Book List For Latest Updated Syllabus - TNPSC Group 4, VAO, Group 2, Group 1, Notificati 1
No ratings yet
TNPSC Group 2 Mains Preparation Book List For Latest Updated Syllabus - TNPSC Group 4, VAO, Group 2, Group 1, Notificati 1
5 pages
Modified Limited Memory BFGS Method With Nonmonoto
No ratings yet
Modified Limited Memory BFGS Method With Nonmonoto
23 pages
Numerical Optimization For Inverse Problems - 10 Lectures On Inverse Problems and Imaging
No ratings yet
Numerical Optimization For Inverse Problems - 10 Lectures On Inverse Problems and Imaging
15 pages
1991imajna 11 325 332
No ratings yet
1991imajna 11 325 332
9 pages
On The Connection Between The Conjugate Gradient Method and Quasi-Newton Methods On Quadratic Problems
No ratings yet
On The Connection Between The Conjugate Gradient Method and Quasi-Newton Methods On Quadratic Problems
20 pages
Yogashayan Sanskrit Asanas Names List
No ratings yet
Yogashayan Sanskrit Asanas Names List
4 pages
Using Gradient Directions To Get Global Convergence of Neewton-Type Metods
No ratings yet
Using Gradient Directions To Get Global Convergence of Neewton-Type Metods
22 pages
BFGS
No ratings yet
BFGS
9 pages
L-BFGS Algorithm
No ratings yet
L-BFGS Algorithm
4 pages
Kelley - Iterative Methods For Optimization-SIAM (1999) PDF
No ratings yet
Kelley - Iterative Methods For Optimization-SIAM (1999) PDF
187 pages
A Progressive Batching L-BFGS Method For Machine Learning
No ratings yet
A Progressive Batching L-BFGS Method For Machine Learning
10 pages
DFO of Noisy Functions Via Quasi Newtion Methods - Berahas, Byrd, Nocedal (2018)
No ratings yet
DFO of Noisy Functions Via Quasi Newtion Methods - Berahas, Byrd, Nocedal (2018)
40 pages
1 s2.0 S0893965901001628 Main
No ratings yet
1 s2.0 S0893965901001628 Main
7 pages
Algorithm 778: L-BFGS-B: Fortran Subroutines For Large-Scale Bound-Constrained Optimization
No ratings yet
Algorithm 778: L-BFGS-B: Fortran Subroutines For Large-Scale Bound-Constrained Optimization
11 pages
School Directory: Santy C. Balaoro
No ratings yet
School Directory: Santy C. Balaoro
3 pages
Hauser Lecture2
No ratings yet
Hauser Lecture2
26 pages
0-02-Oct-2017-05-10-50English Self Learning Material PDF
No ratings yet
0-02-Oct-2017-05-10-50English Self Learning Material PDF
258 pages
Experimental Study of The Broyden Class Updating Method For Solving Non-Linear Unconstrained Optimization Problems
No ratings yet
Experimental Study of The Broyden Class Updating Method For Solving Non-Linear Unconstrained Optimization Problems
10 pages
Lecture 15
No ratings yet
Lecture 15
3 pages
Metodos Iterativos para Optimizacion
No ratings yet
Metodos Iterativos para Optimizacion
188 pages
Convergence of Gauss-Newtons Method and Uniqueness of The Solution
No ratings yet
Convergence of Gauss-Newtons Method and Uniqueness of The Solution
20 pages
Global Convergence of A Modified Fletcher-Reeves Conjugate Gradient Method With Armijo-Type Line Search - Zhang, Zhou (2006)
No ratings yet
Global Convergence of A Modified Fletcher-Reeves Conjugate Gradient Method With Armijo-Type Line Search - Zhang, Zhou (2006)
12 pages
A Truncated Nonmonotone Gauss-Newton Method For Large-Scale Nonlinear Least-Squares Problems
No ratings yet
A Truncated Nonmonotone Gauss-Newton Method For Large-Scale Nonlinear Least-Squares Problems
16 pages
The Convergence of Quasi-Gauss-Newton Methods For Nonlinear Problems
No ratings yet
The Convergence of Quasi-Gauss-Newton Methods For Nonlinear Problems
12 pages
Lec 13
No ratings yet
Lec 13
6 pages
Ica20100100003 17780538
No ratings yet
Ica20100100003 17780538
8 pages
C62 ProblemSheet 2 PartAandC Solutions 2024
No ratings yet
C62 ProblemSheet 2 PartAandC Solutions 2024
5 pages
Pectus Carinatum Pigeon Chest
No ratings yet
Pectus Carinatum Pigeon Chest
2 pages
Johnson and Lester 2021 - Mental Health in Academia - Hacks For Cultivating and Sustaining Wellbeing
100% (1)
Johnson and Lester 2021 - Mental Health in Academia - Hacks For Cultivating and Sustaining Wellbeing
13 pages
Lecture 7 8 Other Descent Methods
No ratings yet
Lecture 7 8 Other Descent Methods
7 pages
Numerical Optimization: Lecture Notes #18 Quasi-Newton Methods - The BFGS Method
No ratings yet
Numerical Optimization: Lecture Notes #18 Quasi-Newton Methods - The BFGS Method
24 pages
Lecture 5
No ratings yet
Lecture 5
6 pages
An Implicit Function Theorem For Locally Lipschitz-2001
No ratings yet
An Implicit Function Theorem For Locally Lipschitz-2001
6 pages
Housekeeping NC Ii: Course Structure Basic Competencies
No ratings yet
Housekeeping NC Ii: Course Structure Basic Competencies
2 pages
Tube Stube Settlers
No ratings yet
Tube Stube Settlers
9 pages
Lim 05429427
No ratings yet
Lim 05429427
10 pages
Quasi Newton PDF
No ratings yet
Quasi Newton PDF
15 pages
Wiki Lbfgs
No ratings yet
Wiki Lbfgs
6 pages
Quasi Newton PDF
No ratings yet
Quasi Newton PDF
15 pages
Quasi Newton Methods
No ratings yet
Quasi Newton Methods
15 pages
Multi-Variable Optimization Methods
No ratings yet
Multi-Variable Optimization Methods
21 pages
Newton Gauss Method
No ratings yet
Newton Gauss Method
37 pages
Quasi Newton Methods
No ratings yet
Quasi Newton Methods
17 pages
A Modified BFGS Method and Its Global Convergence in Nonconvex Minimization - 2001 - Journal of Computational and Applied Mathematics
No ratings yet
A Modified BFGS Method and Its Global Convergence in Nonconvex Minimization - 2001 - Journal of Computational and Applied Mathematics
21 pages
Shanno Conmin
No ratings yet
Shanno Conmin
5 pages
L-BFGS-B Summary
No ratings yet
L-BFGS-B Summary
1 page
Support Lecture 1
No ratings yet
Support Lecture 1
4 pages
(Case) - Honda B
No ratings yet
(Case) - Honda B
5 pages
2nd Share Capital of A Company
No ratings yet
2nd Share Capital of A Company
34 pages
Frida Kahlo: By: Maria Jose Castillo, Camila Amaya, Danna Valencia
No ratings yet
Frida Kahlo: By: Maria Jose Castillo, Camila Amaya, Danna Valencia
9 pages
Conjugate Gradient Method: Com S 477/577 Nov 6, 2007
No ratings yet
Conjugate Gradient Method: Com S 477/577 Nov 6, 2007
8 pages
16.323 Optimal Control Problems Set 1
No ratings yet
16.323 Optimal Control Problems Set 1
3 pages
), R Is Continuously Differentiable.: Journal of Industrial and Management Optimization Volume 1, Number 2, May 2005
No ratings yet
), R Is Continuously Differentiable.: Journal of Industrial and Management Optimization Volume 1, Number 2, May 2005
8 pages
Optimization Class Notes MTH-9842
No ratings yet
Optimization Class Notes MTH-9842
25 pages
Project For Automated Train by Roshan
No ratings yet
Project For Automated Train by Roshan
6 pages
Using The Tea Evaluation Sheet
No ratings yet
Using The Tea Evaluation Sheet
4 pages
Library Manager
No ratings yet
Library Manager
20 pages
Motion To Disqualify Allen Baddour
No ratings yet
Motion To Disqualify Allen Baddour
12 pages
A Guilted Age Apologies For The Past Ashraf A H Rushdy PDF Download
No ratings yet
A Guilted Age Apologies For The Past Ashraf A H Rushdy PDF Download
77 pages
PAHS 055 Session 4 Disaster Management - 1
No ratings yet
PAHS 055 Session 4 Disaster Management - 1
27 pages
Inglesina Zippy Free Manual
No ratings yet
Inglesina Zippy Free Manual
44 pages
Qitd 352
No ratings yet
Qitd 352
8 pages
Chapter 2 - DTC & MTC
No ratings yet
Chapter 2 - DTC & MTC
49 pages
Lect 4 Quantinfo 1112
No ratings yet
Lect 4 Quantinfo 1112
21 pages
Convexity Properties of The Moment Mapping II
No ratings yet
Convexity Properties of The Moment Mapping II
37 pages
7642 Ttmanifold DFG
No ratings yet
7642 Ttmanifold DFG
35 pages
The Flow Approach To Swept Volume
No ratings yet
The Flow Approach To Swept Volume
87 pages
Topology and Geometry for Physicists
From Everand
Topology and Geometry for Physicists
Charles Nash
3.5/5 (1)
Inbound 7418254903065815207
No ratings yet
Inbound 7418254903065815207
78 pages
G. Sanchez Serra
No ratings yet
G. Sanchez Serra
7 pages
OECD Work Education Skills Policy Products Services For Countries
No ratings yet
OECD Work Education Skills Policy Products Services For Countries
9 pages
2 Dimensional Manifolds PDF
No ratings yet
2 Dimensional Manifolds PDF
6 pages
Case Daka
No ratings yet
Case Daka
7 pages
Invisisil Op2131sd Uv Cure Optical Bonding Silicone Tds
No ratings yet
Invisisil Op2131sd Uv Cure Optical Bonding Silicone Tds
5 pages
Coses Interessants: G. S. S. 3 de Novembre de 2017
No ratings yet
Coses Interessants: G. S. S. 3 de Novembre de 2017
1 page
EngMech - Lecture 1.0
No ratings yet
EngMech - Lecture 1.0
20 pages
GABB18 Paper 5
No ratings yet
GABB18 Paper 5
8 pages
Qip hw5 Solution
No ratings yet
Qip hw5 Solution
7 pages
317-322, Published June 08, 2024
No ratings yet
317-322, Published June 08, 2024
6 pages
Riemann Surfaces HW 11 Def
No ratings yet
Riemann Surfaces HW 11 Def
2 pages
General Surgical Operations 2nd Edition by Dhiraj Choudhury Ebook and TestBank Bundle Official Test Bank
No ratings yet
General Surgical Operations 2nd Edition by Dhiraj Choudhury Ebook and TestBank Bundle Official Test Bank
332 pages
Print - Udyam Registration Certificate
No ratings yet
Print - Udyam Registration Certificate
2 pages
Exercises of Differential Forms
From Everand
Exercises of Differential Forms
Simone Malacrida
No ratings yet
Introduction to Functional Analysis
From Everand
Introduction to Functional Analysis
Simone Malacrida
No ratings yet
Exercises of Functional Analysis
From Everand
Exercises of Functional Analysis
Simone Malacrida
No ratings yet

Bfgs

Uploaded by

Bfgs

Uploaded by

BFGS convergence to nonsmooth minimizers of

Key words: convex; BFGS; quasi-Newton; nonsmooth.

(2.3) Hk ∇f (xk ) ∈ −R+ sk

Example: the Euclidean norm

Proof We consider an open convex set U ⊂ Rn containing a BFGS sequence (xk )

Ṽ = W ∪ {x ∈ V : f (x) < f (x̄)}.

f (xk ) = f˜(xk ) → min f˜ = min f,

cl(xk ) ∩ (N + 2B) = ∅ and max f < α,

where B denotes the closed unit ball in Rn . The distance function dN : Rn → R

Ω = {x : dN (x) ≥ 2 and f (x) ≤ α}

f (xk ) = f (xk ) ↓ min f as k → ∞.

By assumption, there exists a sequence of points xr ∈ V (for r = 1, 2, 3, . . .) sat-

min f ≤ lim f (xk ) = min f ≤ f (xr ) = f (xr ).

Taking the limit as r → ∞ shows limk f (xk ) = min f , as required. 2

The following consequence suggests simple examples.

Corollary 3.6 Consider an open semi-algebraic convex set U ⊂ Rn containing

(i) The algorithm succeeds: f (xk ) → min f .

(ii) A subsequence of the iterates converges to a point where f is nonsmooth.

For all constants δ > 0, there is a convex open neighborhood Uδ ⊂ U of

min f + 2δ ≤ f (xk ) = fδ (xk ) ↓ min fδ ≤ min f + δ,

so the result follows. 2

We can apply this result directly to the Euclidean norm.

Proof For any δ > 0, consider the function gδ : R → R defined by

This function is C (2) convex and symmetric. The function fδ : Rn → R defined

Analogously, the following result is a more direct version of Theorem 3.3.

f (xk ) = fδ (xk ) ↓ min fδ as k → ∞.

By assumption, there exists a sequence of points xr ∈ V (for r = 1, 2, 3, . . .) sat-

We end by proving a claim from the introduction.

As we remarked in the introduction, numerical evidence strongly supports a

[9] H.S. Sendov. Nonsmooth analysis of Lorentz invariant functions. SIAM J.

[11] M. Yan. Extension of convex function. J. Convex Anal., 21(4):965–987, 2014.

You might also like

cl(xk ) ∩ (N + 2B) = ∅ and max f < α,

Ω = {x : dN (x) ≥ 2 and f (x) ≤ α}

f (xk ) = f (xk ) ↓ min f as k → ∞.

min f ≤ lim f (xk ) = min f ≤ f (xr ) = f (xr ).