Skip to Main content Skip to Navigation
New interface
Preprints, Working Papers, ...

Scaling ResNets in the Large-depth Regime

Abstract : Deep ResNets are recognized for achieving state-of-the-art results in complex machine learning tasks. However, the remarkable performance of these architectures relies on a training procedure that needs to be carefully crafted to avoid vanishing or exploding gradients, particularly as the depth L increases. No consensus has been reached on how to mitigate this issue, although a widely discussed strategy consists in scaling the output of each layer by a factor α_L. We show in a probabilistic setting that with standard i.i.d. initializations, the only non-trivial dynamics is for α_L = 1 / √ L (other choices lead either to explosion or to identity mapping). This scaling factor corresponds in the continuous-time limit to a neural stochastic differential equation, contrarily to a widespread interpretation that deep ResNets are discretizations of neural ordinary differential equations. By contrast, in the latter regime, stability is obtained with specific correlated initializations and α_L = 1 / L. Our analysis suggests a strong interplay between scaling and regularity of the weights as a function of the layer index. Finally, in a series of experiments, we exhibit a continuous range of regimes driven by these two parameters, which jointly impact performance before and after training.
Document type :
Preprints, Working Papers, ...
Complete list of metadata
Contributor : Pierre Marion Connect in order to contact the contributor
Submitted on : Friday, June 17, 2022 - 11:20:48 AM
Last modification on : Saturday, October 22, 2022 - 5:13:01 AM


Files produced by the author(s)


  • HAL Id : hal-03697725, version 1


Pierre Marion, Adeline Fermanian, Gérard Biau, Jean-Philippe Vert. Scaling ResNets in the Large-depth Regime. 2022. ⟨hal-03697725⟩



Record views


Files downloads