Microsoft wins ImageNet 2015 through feedforward LSTM without gates

Microsoft Wins ImageNet 2015 through Feedforward LSTM without Gates

Jürgen Schmidhuber

Microsoft Research dominated the ImageNet 2015 contest with a deep neural network of 150 layers [1]. Congrats to Kaiming He & Xiangyu Zhang & Shaoqing Ren & Jian Sun on the great results [2]!

Their CNN layers compute G(F(x)+x), which is essentially a feedforward Long Short-Term Memory (LSTM) [3] without gates!

Their net is similar to the very deep Highway Networks [4] (with hundreds of layers), which are feedforward LSTMs with forget gates (= gated recurrent units) [5].

The authors mention the vanishing gradient problem, but do not mention my very first student Sepp Hochreiter (now professor) who identified and analyzed this fundamental deep learning problem in 1991, years before anybody else did [6].

Apart from the above, I liked the paper [1] a lot. LSTM concepts keep invading CNN territory [e.g., 7a-e], also through GPU-friendly multi-dimensional LSTMs [8].


[1] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. Deep Residual Learning for Image Recognition. arxiv:1512.03385

[2] ImageNet Large Scale Visual Recognition Challenge 2015 (ILSVRC2015): Results

[3] S. Hochreiter, J. Schmidhuber. Long Short-Term Memory. Neural Computation, 9(8):1735-1780, 1997. Based on TR FKI-207-95, TUM (1995). PDF. Led to a lot of follow-up work, and is now heavily used by leading IT companies all over the world.

[4] R. K. Srivastava, K. Greff, J. Schmidhuber. Training Very Deep Networks. NIPS 2015; arxiv:1505.00387.

[5] F. A. Gers, J. Schmidhuber, F. Cummins. Learning to Forget: Continual Prediction with LSTM. Neural Computation, 12(10):2451-2471, 2000. PDF.

[6] S. Hochreiter. Untersuchungen zu dynamischen neuronalen Netzen. Diploma thesis, TU Munich, 1991. Advisor: J. Schmidhuber. Overview.

[7a] 2011: First superhuman CNNs
[7b] 2011: First human-competitive CNNs for handwriting
[7c] 2012: First CNN to win segmentation contest
[7d] 2012: First CNN to win contest on object discovery in large images
[7e] Deep Learning. Scholarpedia, 10(11):32832, 2015

[8] M. Stollenga, W. Byeon, M. Liwicki, J. Schmidhuber. Parallel Multi-Dimensional LSTM, with Application to Fast Biomedical Volumetric Image Segmentation. NIPS 2015; arxiv:1506.07452.

Can you spot the Fibonacci pattern in the graphics above?