Computing and Mathematical Sciences Colloquium
Optimization is a core primitive in statistics, machine learning, and data analytics. Within these fields, the rapid growth in the scale and complexity of modern datasets has led to a focus on two classes of algorithms: gradient methods and momentum methods (also referred to as accelerated methods). Momentum methods, first proposed by Nesterov in 1983, achieve faster convergence rates than gradient methods. However, unlike gradient methods, they are not descent methods and providing robust performance guarantees remains a challenge. In the Euclidean setting, momentum methods can be understood as modeling the dynamics of a damped harmonic oscillator; making this intuition precise however, and generalizing it to other geometries has been difficult. Furthermore, derivations of momentum methods do not flow from a single underlying principle, but tend to rely on case-specific algebra using a technique--considered by many to be esoteric--called estimate sequences.
The first part of our work introduces a variational, continuous-time framework for understanding momentum methods. We show that there is a family of Lagrangian functionals, that we call Bregman Lagrangians, which generate dynamics corresponding to momentum methods in continuous time. In particular, momentum methods can be understood as arising from various discretization techniques applied to these continuous time dynamics. The second part of our work strengthens this connection. We demonstrate how to derive families of Lyapunov functions which can certify rates of convergence for the continuous time momentum dynamics. We further demonstrate how the proofs of convergence of momentum methods can be understood as bounding discretization errors of the Lyapunov function when moving to discrete time. Along the way, we prove an equivalence between these family of Lyapunov functions and the technique of estimate sequences. The following is joint work with Andre Wibisono, Stephen Tu, Shivaram Venkataraman, Alex Gittens, Benjamin Recht, and Michael I. Jordan.
This lecture is part of the Young Investigators Lecture Series sponsored by the Caltech Division of Engineering & Applied Science.