The most important papers in deep learning, pt 5 : Theory ( yeah, really !)

Here I deliberately omitted Boltzmann machines and deep belief nets, since they’re state of the art on exactly nothing. There is no complete & operational unified theory of deep learning in neural networks as of yet – hence, most insights come from taking involved connex sub-fields of science such as statistical mechanics, multiscale analysis or Riemannian geometry, and pushing as much as possible to get closed-form expressions. As a result, these papers are technically fairly hardcore.

Spin-glass models of neural networks, D. Amit, 1985

Flat minima, S. Hochreiter, 1997

How transferable are features in deep neural networks?, J. Yosinski, 2014

Identifying and attacking the saddle point problem in high-dimensional non-convex optimization, Y. Dauphin, 2014

A Mathematical Motivation for Complex-Valued Convolutional Networks, J. Bruna, 2015 (+ Invariant Scattering Convolution Networks, J. Bruna, 2012)

The loss surfaces of multilayer networks, A. Choromanska, 2015

Transition to chaos in random neuronal networks, J. Kadmon, 2015

Maximally informative hierarchical representations of high-dimensional data, G. Ver Steeg, 2015 ( + Variational Information Maximization for feature selection, S. Gao, 2016)

On the expressive power of deep neural networks, M. Raghu ( + Deep Information Propagation , S. Schoenholz, 2017)

Deep learning without poor minima, K. Kawaguchi, 2016

Deep neural networks with random Gaussian weights : a universal classification strategy ? , R. Giryes, 2016 ( + Robust Large Margin Deep Neural Networks , J. Sokolic, 2016)

Topology and geometry of half-rectified network optimization, C. Freeman, 2016