Code360 powered by Coding Ninjas X Naukri.com. Code360 powered by Coding Ninjas X Naukri.com
Table of contents
1.
Introduction
2.
Second-order optimization algorithm
3.
BFGS algorithm
4.
L-BFGS algorithm
4.1.
Maths behind L-BFGS algorithm
5.
FAQ’s
6.
Key Takeaways
Last Updated: Mar 27, 2024

Limited-Memory Broyden-Fletcher-Goldfarb-Shanno Algorithm

Author soham Medewar
0 upvote
Leveraging ChatGPT - GenAI as a Microsoft Data Expert
Speaker
Prerita Agarwal
Data Specialist @
23 Jul, 2024 @ 01:30 PM

Introduction

Jorge Nocedal is the creator of this algorithm. It was created in Optimization Center, a joint venture of Argonne National Laboratory and Northwestern University. This method uses less memory than the BFGS algorithm, so this algorithm is perfect for large datasets.

Before moving on to the L-BFGS algorithm, let us quickly revise the second-order optimization algorithm and the BFGS algorithm.

Second-order optimization algorithm

The second-order derivative is the rate of change of the rate of change or derivative of the derivative.

It can be followed to locate the optima of the objective function more efficiently. 

The second-order derivative tells us which direction to move and also estimates how far to move in that direction, called the step size.

All second-order methods calculate or approximate the Hessian. We can think of the Hessian as the derivative of the Jacobian. It is a matrix of second-order partial derivatives, analogous to “tracking acceleration rather than speed.” The Hessian’s job is to describe the curvature of each point of the Jacobian. Second-order methods include:

  • Limited-memory BFGS (L-BFGS)
  • Conjugate gradient
  • Hessian free
Get the tech career you deserve, faster!
Connect with our expert counsellors to understand how to hack your way to success
User rating 4.7/5
1:1 doubt support
95% placement record
Akash Pal
Senior Software Engineer
326% Hike After Job Bootcamp
Himanshu Gusain
Programmer Analyst
32 LPA After Job Bootcamp
After Job
Bootcamp

BFGS algorithm

BFGS algorithm was discovered by four scientists named Broyden, Fletcher, Goldfarb, and Shanno.

  • It is a local search algorithm done on purpose for convex optimization problems with a single optima.
  • The BFGS algorithm belongs to a group of algorithms that is an extension to Quasi-Newton methods or Newton’s optimization method.
  • Newton's optimization method is a second-order optimization algorithm that uses the Hessian matrix.

Newton's method requires the calculation of the inverse of the Hessian matrix. Calculating the Hessian matrix is computationally expensive and is not stable depending on the properties of the objective function. This is one of the limitations of Newton's method. Quasi-Newton methods are second-order optimization algorithms that approximate the inverse of the Hessian matrix using the gradient, meaning that the Hessian and its inverse do not need to be available or calculated precisely for each step of the algorithm.

L-BFGS algorithm

L-BFGS is an optimization algorithm and a so-called quasi-Newton method. Its name indicates that it's a variation of the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm, limiting how much gradient is stored in memory. By this, we mean the algorithm does not compute the full Hessian matrix, which is more computationally expensive.

L-BFGS approximates the inverse Hessian matrix to direct weight adjustments search toward more promising areas of parameter space. Whereas BFGS stores the gradient's full n × n inverse matrix, Hessian L-BFGS stores only a few vectors representing a local approximation. L-BFGS performs faster because it uses approximated second-order information. L-BFGS and conjugate gradient in practice can be faster and more stable than SGD methods.

Maths behind L-BFGS algorithm

IA second-degree approximation is used to find the minimum function f(x) in these methods. Taylor series of function f(x) is as follows:

Source

Here delta f is a function gradient, and H is hessian.

The Taylor series of the gradient will look like this:

Solving the equation to find the minimum.

Therefore we get:

Now hessian comes into play. Every method in this family has a different method of calling hessian.
You can also read about the memory hierarchy.

FAQ’s

  1. Is BFGS gradient-based?
    The BFGS Hessian approximation can either be based on the full history of gradients, in which case it is referred to as BFGS, or it can be based only on the most recent m gradients, in which case it is known as limited memory BFGS, abbreviated as L-BFGS.
     
  2. What is BFGS in machine learning?
    BFGS is a second-order optimization algorithm. It is an acronym named for the four co-discovers of the algorithm: Broyden, Fletcher, Goldfarb, and Shanno.
     
  3. What are second-order optimization methods?
    Second-order optimization technique is the advance of first-order optimization in neural networks. It provides additional curvature information of an objective function that adaptively estimates the step-length of optimization trajectory in the training phase of the neural network.

Key Takeaways

In this article, we have discussed the following topics:

  • Second-order optimization algorithm
  • BFGS algorithm
  • L-BFGS algorithm

Check out this article - Padding In Convolutional Neural Network
Hello readers, here's a perfect course that will guide you to dive deep into Machine learning.

Happy Coding!

Live masterclass