0% found this document useful (0 votes)
84 views6 pages

Wiki Lbfgs

Uploaded by

Link Zelda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
84 views6 pages

Wiki Lbfgs

Uploaded by

Link Zelda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Limited-memory BFGS - Wikipedia https://en.wikipedia.

org/wiki/Limited-memory_BFGS

Limited-memory BFGS
Limited-memory BFGS (L-BFGS or LM-BFGS) is an optimization algorithm in the family of
quasi-Newton methods that approximates the Broyden–Fletcher–Goldfarb–Shanno algorithm
(BFGS) using a limited amount of computer memory. It is a popular algorithm for parameter
estimation in machine learning.[1][2] The algorithm's target problem is to minimize over
unconstrained values of the real-vector where is a differentiable scalar function.

Like the original BFGS, L-BFGS uses an estimate of the inverse Hessian matrix to steer its search
through variable space, but where BFGS stores a dense approximation to the inverse
Hessian (n being the number of variables in the problem), L-BFGS stores only a few vectors that
represent the approximation implicitly. Due to its resulting linear memory requirement, the
L-BFGS method is particularly well suited for optimization problems with many variables. Instead
of the inverse Hessian Hk, L-BFGS maintains a history of the past m updates of the position x and
gradient ∇f(x), where generally the history size m can be small (often ). These updates are
used to implicitly do operations requiring the Hk-vector product.

Contents
Algorithm
Applications
Variants
L-BFGS-B
OWL-QN
O-LBFGS
Implementation of variants
Works cited
Further reading

Algorithm
The algorithm starts with an initial estimate of the optimal value, , and proceeds iteratively to
refine that estimate with a sequence of better estimates . The derivatives of the function
are used as a key driver of the algorithm to identify the direction of steepest
descent, and also to form an estimate of the Hessian matrix (second derivative) of .

L-BFGS shares many features with other quasi-Newton algorithms, but is very different in how the
matrix-vector multiplication is carried out, where is the approximate Newton's
direction, is the current gradient, and is the inverse of the Hessian matrix. There are
multiple published approaches using a history of updates to form this direction vector. Here, we

1 of 6 12/16/2021, 1:45 PM
Limited-memory BFGS - Wikipedia https://en.wikipedia.org/wiki/Limited-memory_BFGS

give a common approach, the so-called "two loop recursion."[3][4]

We take as given , the position at the k-th iteration, and where is the function
being minimized, and all vectors are column vectors. We also assume that we have stored the last
m updates of the form

We define , and will be the 'initial' approximate of the inverse Hessian that our

estimate at iteration k begins with.

The algorithm is based on the BFGS recursion for the inverse Hessian as

For a fixed k we define a sequence of vectors as and


. Then a recursive algorithm for calculating from is to define
and . We also define another sequence of vectors as
. There is another recursive algorithm for calculating these vectors which is to define
and then recursively define and . The
value of is then our ascent direction.

Thus we can compute the descent direction as follows:

This formulation gives the search direction for the minimization problem, i.e., . For
maximization problems, one should thus take -z instead. Note that the initial approximate inverse
Hessian is chosen as a diagonal matrix or even a multiple of the identity matrix since this is
numerically efficient.

2 of 6 12/16/2021, 1:45 PM
Limited-memory BFGS - Wikipedia https://en.wikipedia.org/wiki/Limited-memory_BFGS

The scaling of the initial matrix ensures that the search direction is well scaled and therefore
the unit step length is accepted in most iterations. A Wolfe line search is used to ensure that the
curvature condition is satisfied and the BFGS updating is stable. Note that some software
implementations use an Armijo backtracking line search, but cannot guarantee that the curvature
condition will be satisfied by the chosen step since a step length greater than may be
needed to satisfy this condition. Some implementations address this by skipping the BFGS update
when is negative or too close to zero, but this approach is not generally recommended since
the updates may be skipped too often to allow the Hessian approximation to capture important
curvature information.

This two loop update only works for the inverse Hessian. Approaches to implementing L-BFGS
using the direct approximate Hessian have also been developed, as have other means of
approximating the inverse Hessian.[5]

Applications
L-BFGS has been called "the algorithm of choice" for fitting log-linear (MaxEnt) models and
conditional random fields with -regularization.[1][2]

Variants
Since BFGS (and hence L-BFGS) is designed to minimize smooth functions without constraints, the
L-BFGS algorithm must be modified to handle functions that include non-differentiable
components or constraints. A popular class of modifications are called active-set methods, based
on the concept of the active set. The idea is that when restricted to a small neighborhood of the
current iterate, the function and constraints can be simplified.

L-BFGS-B

The L-BFGS-B algorithm extends L-BFGS to handle simple box constraints (aka bound
constraints) on variables; that is, constraints of the form li ≤ xi ≤ ui where li and ui are per-variable
constant lower and upper bounds, respectively (for each xi, either or both bounds may be omitted).
[6][7] The method works by identifying fixed and free variables at every step (using a simple
gradient method), and then using the L-BFGS method on the free variables only to get higher
accuracy, and then repeating the process.

OWL-QN

Orthant-wise limited-memory quasi-Newton (OWL-QN) is an L-BFGS variant for fitting


-regularized models, exploiting the inherent sparsity of such models.[2] It minimizes functions of
the form

where is a differentiable convex loss function. The method is an active-set type method: at each

3 of 6 12/16/2021, 1:45 PM
Limited-memory BFGS - Wikipedia https://en.wikipedia.org/wiki/Limited-memory_BFGS

iterate, it estimates the sign of each component of the variable, and restricts the subsequent step to
have the same sign. Once the sign is fixed, the non-differentiable term becomes a smooth
linear term which can be handled by L-BFGS. After an L-BFGS step, the method allows some
variables to change sign, and repeats the process.

O-LBFGS

Schraudolph et al. present an online approximation to both BFGS and L-BFGS.[8] Similar to
stochastic gradient descent, this can be used to reduce the computational complexity by evaluating
the error function and gradient on a randomly drawn subset of the overall dataset in each
iteration. It has been shown that O-LBFGS has a global almost sure convergence [9] while the
online approximation of BFGS (O-BFGS) is not necessarily convergent.[10]

Implementation of variants
The L-BFGS-B variant also exists as ACM TOMS algorithm 778.[7][11] In February 2011, some of the
authors of the original L-BFGS-B code posted a major update (version 3.0).

A reference implementation is available in Fortran 77 (and with a Fortran 90 interface).[12][13] This


version, as well as older versions, has been converted to many other languages.

An OWL-QN implementation is available as C++ implementation by its designers.[2][14]

Works cited
1. Malouf, Robert (2002). "A comparison of algorithms for maximum entropy parameter
estimation" (https://dl.acm.org/citation.cfm?id=1118871). Proceedings of the Sixth Conference
on Natural Language Learning (CoNLL-2002). pp. 49–55. doi:10.3115/1118853.1118871 (http
s://doi.org/10.3115%2F1118853.1118871).
2. Andrew, Galen; Gao, Jianfeng (2007). "Scalable training of L₁-regularized log-linear models"
(http://research.microsoft.com/apps/pubs/default.aspx?id=78900). Proceedings of the 24th
International Conference on Machine Learning. doi:10.1145/1273496.1273501 (https://doi.org/1
0.1145%2F1273496.1273501). ISBN 9781595937933. S2CID 5853259 (https://api.semanticsch
olar.org/CorpusID:5853259).
3. Matthies, H.; Strang, G. (1979). "The solution of non linear finite element equations".
International Journal for Numerical Methods in Engineering. 14 (11): 1613–1626.
Bibcode:1979IJNME..14.1613M (https://ui.adsabs.harvard.edu/abs/1979IJNME..14.1613M).
doi:10.1002/nme.1620141104 (https://doi.org/10.1002%2Fnme.1620141104).
4. Nocedal, J. (1980). "Updating Quasi-Newton Matrices with Limited Storage" (https://doi.org/10.
1090%2FS0025-5718-1980-0572855-7). Mathematics of Computation. 35 (151): 773–782.
doi:10.1090/S0025-5718-1980-0572855-7 (https://doi.org/10.1090%2FS0025-5718-1980-05728
55-7).
5. Byrd, R. H.; Nocedal, J.; Schnabel, R. B. (1994). "Representations of Quasi-Newton Matrices
and their use in Limited Memory Methods". Mathematical Programming. 63 (4): 129–156.
doi:10.1007/BF01582063 (https://doi.org/10.1007%2FBF01582063). S2CID 5581219 (https://a
pi.semanticscholar.org/CorpusID:5581219).

4 of 6 12/16/2021, 1:45 PM
Limited-memory BFGS - Wikipedia https://en.wikipedia.org/wiki/Limited-memory_BFGS

6. Byrd, R. H.; Lu, P.; Nocedal, J.; Zhu, C. (1995). "A Limited Memory Algorithm for Bound
Constrained Optimization" (https://digital.library.unt.edu/ark:/67531/metadc666315/). SIAM J.
Sci. Comput. 16 (5): 1190–1208. doi:10.1137/0916069 (https://doi.org/10.1137%2F0916069).
7. Zhu, C.; Byrd, Richard H.; Lu, Peihuang; Nocedal, Jorge (1997). "L-BFGS-B: Algorithm 778:
L-BFGS-B, FORTRAN routines for large scale bound constrained optimization". ACM
Transactions on Mathematical Software. 23 (4): 550–560. doi:10.1145/279232.279236 (https://d
oi.org/10.1145%2F279232.279236). S2CID 207228122 (https://api.semanticscholar.org/Corpus
ID:207228122).
8. Schraudolph, N.; Yu, J.; Günter, S. (2007). A stochastic quasi-Newton method for online
convex optimization. AISTATS.
9. Mokhtari, A.; Ribeiro, A. (2015). "Global convergence of online limited memory BFGS" (http://w
ww.jmlr.org/papers/volume16/mokhtari15a/mokhtari15a.pdf) (PDF). Journal of Machine
Learning Research. 16: 3151–3181. arXiv:1409.2045 (https://arxiv.org/abs/1409.2045).
10. Mokhtari, A.; Ribeiro, A. (2014). "RES: Regularized Stochastic BFGS Algorithm". IEEE
Transactions on Signal Processing. 62 (23): 6089–6104. arXiv:1401.7625 (https://arxiv.org/abs/
1401.7625). Bibcode:2014ITSP...62.6089M (https://ui.adsabs.harvard.edu/abs/2014ITSP...62.6
089M). CiteSeerX 10.1.1.756.3003 (https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.
756.3003). doi:10.1109/TSP.2014.2357775 (https://doi.org/10.1109%2FTSP.2014.2357775).
S2CID 15214938 (https://api.semanticscholar.org/CorpusID:15214938).
11. "TOMS Home" (http://toms.acm.org/). toms.acm.org.
12. Morales, J. L.; Nocedal, J. (2011). "Remark on "algorithm 778: L-BFGS-B: Fortran subroutines
for large-scale bound constrained optimization" ". ACM Transactions on Mathematical
Software. 38: 1–4. doi:10.1145/2049662.2049669 (https://doi.org/10.1145%2F2049662.204966
9). S2CID 16742561 (https://api.semanticscholar.org/CorpusID:16742561).
13. "L-BFGS-B Nonlinear Optimization Code" (http://users.iems.northwestern.edu/~nocedal/lbfgs
b.html). users.iems.northwestern.edu.
14. "Orthant-Wise Limited-memory Quasi-Newton Optimizer for L1-regularized Objectives" (http
s://www.microsoft.com/en-us/download/details.aspx?id=52452). Microsoft Download Center.

Further reading
Liu, D. C.; Nocedal, J. (1989). "On the Limited Memory Method for Large Scale Optimization"
(http://www.ece.northwestern.edu/~nocedal/PSfiles/limited-memory.ps.gz). Mathematical
Programming B. 45 (3): 503–528. CiteSeerX 10.1.1.110.6443 (https://citeseerx.ist.psu.edu/view
doc/summary?doi=10.1.1.110.6443). doi:10.1007/BF01589116 (https://doi.org/10.1007%2FBF0
1589116). S2CID 5681609 (https://api.semanticscholar.org/CorpusID:5681609).
Haghighi, Aria (2 Dec 2014). "Numerical Optimization: Understanding L-BFGS" (http://aria42.c
om/blog/2014/12/understanding-lbfgs).
Pytlak, Radoslaw (2009). "Limited Memory Quasi-Newton Algorithms" (https://www.google.co
m/books/edition/Conjugate_Gradient_Algorithms_in_Nonconv/RhRkaDPmwVoC?hl=en&gbpv=
1&pg=PA159). Conjugate Gradient Algorithms in Nonconvex Optimization. Springer.
pp. 159–190. ISBN 978-3-540-85633-7.

Retrieved from "https://en.wikipedia.org/w/index.php?title=Limited-memory_BFGS&oldid=1050393862"

This page was last edited on 17 October 2021, at 15:19 (UTC).

5 of 6 12/16/2021, 1:45 PM
Limited-memory BFGS - Wikipedia https://en.wikipedia.org/wiki/Limited-memory_BFGS

Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply. By using this
site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia
Foundation, Inc., a non-profit organization.

6 of 6 12/16/2021, 1:45 PM

You might also like