Skip to yearly menu bar Skip to main content


Poster

The Global Convergence Time of Stochastic Gradient Descent in Non-Convex Landscapes: Sharp Estimates through Large Deviations

Waïss Azizian · Franck Iutzeler · Jérôme Malick · Panayotis Mertikopoulos

[ ]
Wed 16 Jul 4:30 p.m. PDT — 7 p.m. PDT

Abstract:

In this paper, we examine the time it takes for stochastic gradient descent (SGD) to reach the global minimum of a general, non-convex loss function. We approach this question through the lens of large deviations theory and randomly perturbed dynamical systems, and we provide a tight characterization of the associated hitting times of SGD with matching upper and lower bounds. Our analysis reveals that the global convergence time of SGD is dominated by the most "costly" set of obstacles that the algorithm may need to overcome in order to reach a global minimizer, coupling in this way the geometry of the underlying loss landscape with the statistics of the noise entering the process. Finally, motivated by applications to the training of deep neural networks, we provide a series of refinements and extensions of our analysis to, among others, loss functions with no spurious local minima or ones with bounded depths.

Live content is unavailable. Log in and register to view live content