Mean-field Analysis for Heavy Ball Methods: Dropout-stability, Connectivity, and Global Convergence
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F68407700%3A21230%2F23%3A00375312" target="_blank" >RIV/68407700:21230/23:00375312 - isvavai.cz</a>
Result on the web
<a href="https://openreview.net/pdf?id=gZna3IiGfl" target="_blank" >https://openreview.net/pdf?id=gZna3IiGfl</a>
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
Mean-field Analysis for Heavy Ball Methods: Dropout-stability, Connectivity, and Global Convergence
Original language description
The stochastic heavy ball method (SHB), also known as stochastic gradient descent (SGD) with Polyak's momentum, is widely used in training neural networks. However, despite the remarkable success of such algorithm in practice, its theoretical characterization remains limited. In this paper, we focus on neural networks with two and three layers and provide a rigorous understanding of the properties of the solutions found by SHB: emph{(i)} stability after dropping out part of the neurons, emph{(ii)} connectivity along a low-loss path, and emph{(iii)} convergence to the global optimum. To achieve this goal, we take a mean-field view and relate the SHB dynamics to a certain partial differential equation in the limit of large network widths. This mean-field perspective has inspired a recent line of work focusing on SGD while, in contrast, our paper considers an algorithm with momentum. More specifically, after proving existence and uniqueness of the limit differential equations, we show convergence to the global optimum and give a quantitative bound between the mean-field limit and the SHB dynamics of a finite-width network. Armed with this last bound, we are able to establish the dropout-stability and connectivity of SHB solutions.
Czech name
—
Czech description
—
Classification
Type
J<sub>ost</sub> - Miscellaneous article in a specialist periodical
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
<a href="/en/project/EF16_019%2F0000765" target="_blank" >EF16_019/0000765: Research Center for Informatics</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Others
Publication year
2023
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Name of the periodical
Transactions on Machine Learning Research
ISSN
2835-8856
e-ISSN
2835-8856
Volume of the periodical
—
Issue of the periodical within the volume
February
Country of publishing house
DE - GERMANY
Number of pages
49
Pages from-to
1-49
UT code for WoS article
—
EID of the result in the Scopus database
2-s2.0-105000206429