DeepMind's Nature Paper and Earlier Related Work

DeepMind's Nature Paper and Earlier Related Work

Jürgen Schmidhuber
Pronounce: You_again Shmidhoobuh
26 February 2015 (updated April 2015)

The first four members of DeepMind include two former PhD students of my research group at the Swiss AI Lab IDSIA. Two additional key members of DeepMind also got their PhD degrees in my lab. Nevertheless, I am not quite happy with DeepMind's recent publication in Nature [2], although three of its authors were trained here.

The abstract of DeepMind's paper [2] on learning to play video games claims: "While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces." It also claims to bridge "the divide between high-dimensional sensory inputs and actions." Similarly, the first sentence of the abstract of the earlier tech report version [1] of the article [2] claims to "present the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning."

However, the first such system [3] was created earlier by other researchers at IDSIA.

The earlier system [3] indeed was able to "learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning" (quote from the abstract [2]), without any unsupervised pre-training. It was successfully applied to various problems such as video game-based race car driving from high-dimensional visual input streams. See post-training movie.

It uses recent compressed recurrent neural networks [4] to deal with sequential video inputs in partially observable environments, while system [2] uses more limited feedforward networks and other techniques from over two decades ago, namely, CNNs [5,6], experience replay [7], and temporal difference-based game playing like in the famous self-teaching backgammon player [8], which 20 years ago already achieved the level of human world champions (while the Nature paper [2] reports "more than 75% of the human score on more than half of the games"). After minimal preprocessing in both cases [3][2](Methods), the visual input to both learning systems [3,2] is still high-dimensional.

In 2013, neuroevolution-based reinforcement learning (e.g., survey [12]) also successfully learned to play Atari games [9].

The article [2] also claims "the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks". Since other learning systems also can solve quite diverse tasks, this claim seems debatable at least.

Numerous additional relevant references on "Deep Reinforcement Learning" can be found in Sec. 6 of a recent survey [10]. A recent TED talk [11] suggests that the system [1,2] was a reason why Google bought DeepMind, indicating commercial relevance of this topic.

Compare a popular G+ post on this and the corresponding reply in a recent AMA (Ask Me Anything) on reddit, as well as this online comment at (27 March 2015).


[1] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, M. Riedmiller. Playing Atari with Deep Reinforcement Learning. Tech Report, 19 Dec. 2013,

[2] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, D. Hassabis. Human-level control through deep reinforcement learning. Nature, vol. 518, p 1529, 26 Feb. 2015.

[3] J. Koutnik, G. Cuccu, J. Schmidhuber, F. Gomez. Evolving Large-Scale Neural Networks for Vision-Based Reinforcement Learning. In Proc. Genetic and Evolutionary Computation Conference (GECCO), Amsterdam, July 2013. Overview:

[4] J. Koutnik, F. Gomez, J. Schmidhuber. Evolving Neural Networks in Compressed Weight Space. In Proc. Genetic and Evolutionary Computation Conference (GECCO-2010), Portland, 2010.

[5] K. Fukushima, K. (1979). Neural network model for a mechanism of pattern recognition unaffected by shift in position - Neocognitron. Trans. IECE, J62-A(10):658-665.

[6] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, L. D. Jackel. Back-propagation applied to handwritten zip code recognition. Neural Computation, 1(4):541-551, 1989

[7] L. Lin. Reinforcement Learning for Robots Using Neural Networks. PhD thesis, Carnegie Mellon University, Pittsburgh, 1993.

[8] G. Tesauro. TD-gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 6(2):215-219, 1994.

[9] M. Hausknecht, J. Lehman, R. Miikkulainen, P. Stone. A Neuroevolution Approach to General Atari Game Playing. IEEE Transactions on Computational Intelligence and AI in Games, 16 Dec. 2013.

[10] J. Schmidhuber. Deep Learning in Neural Networks: An Overview. Neural Networks, vol. 61, 85-117, 2015 (888 references, published online in 2014).

[11] L. Page. Where's Google going next? Transcript of TED event, 2014.

[12] Kaelbling, L. P., Littman, M. L., & Moore, A. W. (1996). Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 237-285. (This reference was added on 9 March 2015).

Overview web sites with lots of additional details and papers on Deep Learning

[A] 1991: Fundamental Deep Learning Problem discovered and analysed: in standard NNs, backpropagated error gradients tend to vanish or explode.

[B] Our first Deep Learner of 1991 (RNN stack pre-trained in unsupervised fashion) + Deep Learning timeline 1962-2013 or

[C] 2009: First recurrent Deep Learner to win international competitions with secret test sets: deep LSTM recurrent neural networks [H] won three connected handwriting contests at ICDAR 2009 (French, Arabic, Farsi), performing simultaneous segmentation and recognition.

[D] Very Deep Learning 1991-2013 - our deep NNs have, so far, won 9 important contests in pattern recognition, image segmentation, object detection - or

[E] 2011: First superhuman visual pattern recognition in an official international competition (with secret test set known only to the organisers) - twice better than humans, three times better than the closest artificial NN competitor, six times better than the best non-neural method.

[F] 2012: First Deep Learner to win a contest on object detection in large images: our deep NNs won both the ICPR 2012 Contest and the MICCAI 2013 Grand Challenge on Mitosis Detection (important for cancer prognosis etc, perhaps the most important application area of Deep Learning).

[G] 2012: First Deep Learner to win a pure image segmentation competition: our deep NNs won the ISBI'12 Brain Image Segmentation Challenge (relevant for the billion Euro brain projects in EU and US).

[H] Deep LSTM recurrent NNs since 1995:

[I] Deep Evolving NNs:

[J] Deep Reinforcement Learning NNs:

[K] Compressed NN Search for Huge RNNs:

[L] Who Invented Backpropagation? A brief history of Deep Learning's central algorithm 1960-1981 and beyond:

[M] Slides of talk on Deep Learning (2014):