Projects per year
Abstract
We present and analyze a novel regularized form of the gradient clipping algorithm, proving that it converges to global minima of the loss surface of deep neural networks under the squared loss, provided that the layers are of sufficient width. The algorithm presented here, dubbed δ−GClip, introduces a modification to gradient clipping that leads to a first-of-its-kind example of a step size scheduling for gradient descent that provably minimizes training losses of deep neural nets. We also present empirical evidence that our theoretically founded δ−GClip algorithm is competitive with the state-of-the-art deep learning heuristics on various neural architectures including modern transformer based architectures. The modification we do to standard gradient clipping is designed to leverage the PL* condition, a variant of the Polyak-Łojasiewicz inequality which was recently proven to be true for sufficiently wide neural networks at any depth within a neighbourhood of the initialization.
Original language | English |
---|---|
Journal | Transactions on Machine Learning Research |
Volume | 2025-June |
Publication status | Published - 2025 |
Fingerprint
Dive into the research topics of 'Regularized Gradient Clipping Provably Trains Wide and Deep Neural Networks'. Together they form a unique fingerprint.Projects
- 1 Active
-
MCAIF: Centre for AI Fundamentals
Kaski, S. (PI), Alvarez, M. (Researcher), Pan, W. (Researcher), Mu, T. (Researcher), Rivasplata, O. (PI), Sun, M. (PI), Mukherjee, A. (PI), Caprio, M. (PI), Sonee, A. (Researcher), Leroy, A. (Researcher), Wang, J. (Researcher), Lee, J. (Researcher), Parakkal Unni, M. (Researcher), Sloman, S. (Researcher), Menary, S. (Researcher), Quilter, T. (Researcher), Hosseinzadeh, A. (PGR student), Mousa, A. (PGR student), Glover, E. (PGR student), Das, A. (PGR student), DURSUN, F. (PGR student), Zhu, H. (PGR student), Abdi, H. (PGR student), Dandago, K. (PGR student), Piriyajitakonkij, M. (PGR student), Rachman, R. (PGR student), Shi, X. (PGR student), Keany, T. (PGR student), Liu, X. (PGR student), Jiang, Y. (PGR student), Wan, Z. (PGR student), Harrison, M. (Support team), Machado, M. (Support team), Hartford, J. (PI), Kangin, D. (Researcher), Harikumar, H. (PI), Dubey, M. (PI), Parakkal Unni, M. (PI), Dash, S. P. (PGR student), Mi, X. (PGR student) & Barlas, Y. (PGR student)
1/10/21 → 30/09/26
Project: Research
-
Global Convergence of SGD On Two Layer Neural Nets
Gopalani, P. & Mukherjee, A., 20 Jan 2025, In: Information and Inference: a Journal of the IMA. 22 p.Research output: Contribution to journal › Article › peer-review
Open Access -
Langevin Monte-Carlo Provably Learns Depth Two Neural Nets at Any Size and Data
Kumar, D., Jha, S. & Mukherjee, A., 13 Mar 2025, (Submitted) arXiv, p. 1-10, 10 p.Research output: Preprint/Working paper › Preprint
-
Global Convergence of SGD For Logistic Loss on Two Layer Neural Nets
Gopalani, P., Jha, S. & Mukherjee, A., 25 Feb 2024, In: Transactions on Machine Learning Research . 14, 1, p. 1-21 21 p.Research output: Contribution to journal › Article › peer-review
Open AccessFile32 Downloads (Pure)