Deep Learning in Neural Networks: An Overview

News of August 6, 2017: This paper of 2015 just got the first Best Paper Award ever issued by the journal Neural Networks, founded in 1988.

Deep Learning in Neural Networks: An Overview

Jürgen Schmidhuber
Pronounce: You_again Shmidhoobuh

J. Schmidhuber. Deep Learning in Neural Networks: An Overview. Neural Networks, Volume 61, January 2015, Pages 85-117 (DOI: 10.1016/j.neunet.2014.09.003), published online in 2014.

Based on Preprint IDSIA-03-14 (88 pages, 888 references): arXiv:1404.7828 [cs.NE]; version v4 (PDF, 8 Oct 2014); LATEX source; complete public BIBTEX file (888 kB). (Older PDF versions: v1 of 30 April; v1.5 of 15 May; v2 of 28 May; v3 of 2 July.)

BibTex:
@article{888,
author = "J. Schmidhuber",
title = "Deep Learning in Neural Networks: An Overview",
journal = "Neural Networks",
pages = "85-117",
volume = "61",
doi = "10.1016/j.neunet.2014.09.003",
note = "Published online 2014; based on TR arXiv:1404.7828 [cs.NE]",
year = "2015"}

Abstract. In recent years, deep artificial neural networks (including recurrent ones) have won numerous contests in pattern recognition and machine learning. This historical survey compactly summarises relevant work, much of it from the previous millennium. Shallow and deep learners are distinguished by the depth of their credit assignment paths, which are chains of possibly learnable, causal links between actions and effects. I review deep supervised learning (also recapitulating the history of backpropagation), unsupervised learning, reinforcement learning & evolutionary computation, and indirect search for short programs encoding deep and large networks.

As a machine learning researcher, I am obsessed with credit assignment. In case you know of references to add or correct, please send them with brief explanations to juergen@idsia.ch, preferably together with URL links to PDFs for verification. Between 16 April and 8 October 2014, drafts of this paper have already undergone massive open online peer review through public mailing lists including connectionists@cs.cmu.edu, ml-news@googlegroups.com, compneuro@neuroinf.org, genetic_programming@yahoogroups.com, rl-list@googlegroups.com, imageworld-@diku.dk, and Google+. Thanks to numerous experts for valuable comments!

The contents of this paper may be used for educational and non-commercial purposes, including articles for Wikipedia and similar sites.

Table of Contents

1 Introduction to Deep Learning (DL) in Neural Networks (NNs)

2 Event-Oriented Notation for Activation Spreading in Feedforward NNs (FNNs) and Recurrent NNs (RNNs)

3 Depth of Credit Assignment Paths (CAPs) and of Problems

4 Recurring Themes of Deep Learning

4.1 Dynamic Programming for Supervised / Reinforcement Learning (SL / RL)
4.2 Unsupervised Learning (UL) Facilitating SL and RL
4.3 Learning Hierarchical Representations Through Deep SL, UL, RL
4.4 Occam's Razor: Compression and Minimum Description Length (MDL)
4.5 Fast Graphics Processing Units (GPUs) for DL in NNs

5 Supervised NNs, Some Helped by Unsupervised NNs (with Deep Learning Timeline)

5.1 Early NNs Since the 1940s (and the 1800s)
5.2 Around 1960: Visual Cortex Provides Inspiration for DL (Compare Sec. 5.4, 5.11)
5.3 1965: Deep Networks Based on the Group Method of Data Handling (GMDH)
5.4 1979: Convolution + Weight Replication + Subsampling (Neocognitron)
5.5 1960-1981 and Beyond: Development of Backpropagation (BP) for NNs
5.5.1 BP for Weight-Sharing Feedforward NNs (FNNs) and Recurrent NNs (RNNs)
5.6 Late 1980s-2000: Numerous Improvements of NNs
5.6.1 Ideas for Dealing with Long Time Lags and Deep CAPs
5.6.2 Better BP Through Advanced Gradient Descent (Compare Sec. 5.24)
5.6.3 Searching For Simple, Low-Complexity, Problem-Solving NNs (Compare Sec. 5.24)
5.6.4 Potential Benefits of UL for SL (Compare Sec. 5.7, 5.10, 5.15)
5.7 1987: UL Through Autoencoder (AE) Hierarchies (Compare Sec. 5.15)
5.8 1989: BP for Convolutional NNs (CNNs, Sec. 5.4)
5.9 1991: Fundamental Deep Learning Problem of Gradient Descent
5.10 1991: UL-Based History Compression Through a Deep Hierarchy of RNNs
5.11 1992: Max-Pooling (MP): Towards MPCNNs (Compare Sec. 5.16, 5.19)
5.12 1994: Early Contest-Winning NNs
5.13 1995: Supervised Recurrent Very Deep Learner (LSTM RNN)
5.14 2003: More Contest-Winning/Record-Setting NNs
5.15 2006/7: UL For Deep Belief Networks (DBNs) / AE Stacks Fine-Tuned by BP
5.16 2006/7: Improved CNNs / GPU-CNNs / BP-Trained MPCNNs / LSTM Stacks
5.17 2009: First Official Competitions Won by RNNs, and with MPCNNs
5.18 2010: Plain Backprop (+Distortions) on GPU Yields Excellent Results
5.19 2011: MPCNNs on GPU Achieve Superhuman Vision Performance
5.20 2011: Hessian-Free Optimization for RNNs
5.21 2012: First Contests Won on ImageNet & Object Detection & Segmentation
5.22 2013-: More Contests and Benchmark Records
5.23 Currently Successful Supervised Techniques: LSTM RNNs / GPU-MPCNNs
5.24 Recent Tricks for Improving SL Deep NNs (Compare Sec. 5.6.2, 5.6.3)
5.25 Consequences for Neuroscience
5.26 DL with Spiking Neurons?

6 DL in FNNs and RNNs for Reinforcement Learning (RL)

6.1 RL Through NN World Models Yields RNNs With Deep CAPs
6.2 Deep FNNs for Traditional RL and Markov Decision Processes (MDPs) .
6.3 Deep RL RNNs for Partially Observable MDPs (POMDPs)
6.4 RL Facilitated by Deep UL in FNNs and RNNs
6.5 Deep Hierarchical RL (HRL) and Subgoal Learning with FNNs and RNNs
6.6 Deep RL by Direct NN Search / Policy Gradients / Evolution
6.7 Deep RL by Indirect Policy Search / Compressed NN Search
6.8 Universal RL

7 Conclusion

Overview web sites with lots of additional details and papers on Deep Learning

[A] 1991: Fundamental Deep Learning Problem discovered and analysed: in standard NNs, backpropagated error gradients tend to vanish or explode. More

[B] Our first Deep Learner of 1991 (RNN stack pre-trained in unsupervised fashion) + Deep Learning timeline 1962-2013. More, also under www.deeplearning.me

[C] 2009: First recurrent Deep Learner to win international competitions with secret test sets: deep LSTM recurrent neural networks [H] won three connected handwriting contests at ICDAR 2009 (French, Arabic, Farsi), performing simultaneous segmentation and recognition. More

[D] Very Deep Learning 1991-2013 - our deep NNs have, so far, won 9 important contests in pattern recognition, image segmentation, object detection. More, also under www.deeplearning.it

[E] 2011: First superhuman visual pattern recognition in an official international competition (with secret test set known only to the organisers) - twice better than humans, three times better than the closest artificial NN competitor, six times better than the best non-neural method. More

[F] 2012: First Deep Learner to win a contest on object detection in large images: our deep NNs won both the ICPR 2012 Contest and the MICCAI 2013 Grand Challenge on Mitosis Detection (important for cancer prognosis etc, perhaps the most important application area of Deep Learning). More

[G] 2012: First Deep Learner to win a pure image segmentation competition: our deep NNs won the ISBI'12 Brain Image Segmentation Challenge (relevant for the billion Euro brain projects in EU and US). More

[H] Deep LSTM recurrent NNs since 1995: More

[I] Deep Evolving NNs: More

[J] Deep Reinforcement Learning NNs: More

[K] Compressed NN Search for Huge RNNs: More

[L] Who Invented Backpropagation? A brief history of Deep Learning's central algorithm 1960-1981 and beyond: More

[M] Slides of talk on Deep Learning (2015): More

.