The most important papers in deep learning, pt 4 : Initialization, Regularization, Activations and Optimization

Understanding the difficulty of training deep feedforward neural networks, X. Glorot, 2010

Exact solutions to the nonlinear dynamics of learning in deep linear neural networks, A. Saxe, 2014

Delving deep into Rectifiers : surpassing human-level performance on ImageNet classification, K. He, 2015

Dropout : A simple way to prevent neural networks from overfitting, N. Srivastava, 2014

Batch Normalization : Accelerating Deep Network Training by Reducing Internal Covariate Shift, S. Ioffe, 2015 ( + Recurrent Batch Normalization, T. Cooijmans, 2016)

Deep Sparse Rectifier Neural Networks, X. Glorot, 2011 ( + Fast and accurate deep network learning by exponential linear units, DA Clevert, 2015)

Bridging nonlinearities and stochastic regularizers with Gaussian error linear units, D. Hendrycks, 2016

Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation, Y. Bengio, 2013

The concrete distribution : a continuous relaxation of discrete random variables, C. Maddison, 2016 ( + Categorical reparameterization with Gumbel-Softmax, E. Jang, 2016)

On the importance of initialization and momentum in deep learning, I. Sutskever, 2013

Adaptive subgradient methods for online learning and stochastic optimization, J. Duchi, 2011 ( + Adadelta : an adaptive learning rate method, M. Zeiler, 2012)

Adam : A method for stochastic optimization, D. Kingma, 2014 ( + Incorporating Nesterov Momentum into Adam, T. Dozat, 2015)

Stochastic Gradient Descent with Warm Restarts, I. Loshchilov, 2016

 

Leave a comment