recurrent neural network
1. Our Open Source RNN & LSTM Software Librairies: Brainstorm; RNNLIB; Pybrain.
2. Upcoming RNN Book
3. Old version of this page (2003)

LSTM in Journals:

Jürgen Schmidhuber's page on

Recurrent Neural Networks
(updated 2017)

Why use recurrent networks at all? And why use a particular Deep Learning recurrent network called Long Short-Term Memory or LSTM?

Gradient LSTM
Tutorial
slides
(2002)

RNN
Book
Preface
(2011)

12. K. Greff, R. Srivastava, J. Koutnik, B. Steunebrink, J. Schmidhuber. LSTM: A Search Space Odyssey. IEEE Transactions on Neural Networks and Learning Systems, 2016.

11. A. Graves, M. Liwicki, S. Fernandez, R. Bertolami, H. Bunke, J. Schmidhuber. A Novel Connectionist System for Improved Unconstrained Handwriting Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 5, 2009. PDF.

10. F. Gomez, J. Schmidhuber, R. Miikkulainen. Accelerated Neural Evolution through Cooperatively Coevolved Synapses. Journal of Machine Learning Research (JMLR), 9:937-965, 2008. PDF.

9. H. Mayer, F. Gomez, D. Wierstra, I. Nagy, A. Knoll, and J. Schmidhuber. A System for Robotic Heart Surgery that Learns to Tie Knots Using Recurrent Neural Networks. Advanced Robotics, 22/13-14, p. 1521-1537, 2008.

8. J. Schmidhuber, D. Wierstra, M. Gagliolo, F. Gomez. Training Recurrent Networks by Evolino. Neural Computation, 19(3): 757-779, 2007. PDF.

7. A. Graves and J. Schmidhuber. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Networks 18:5-6, pp. 602-610, 2005. PDF.

6. J. A. Perez-Ortiz, F. A. Gers, D. Eck, J. Schmidhuber. Kalman filters improve LSTM network performance in problems unsolvable by traditional recurrent nets. Neural Networks 16(2):241-250, 2003. PDF. PS.GZ.

5. F. Gers, N. Schraudolph, J. Schmidhuber. Learning precise timing with LSTM recurrent networks. Journal of Machine Learning Research 3:115-143, 2002. PDF. PS.GZ.

4. J. Schmidhuber, F. Gers, D. Eck. J. Schmidhuber, F. Gers, D. Eck. Learning nonregular languages: A comparison of simple recurrent networks and LSTM. Neural Computation 14(9):2039-2041, 2002. PS. PDF.

3. F. A. Gers and J. Schmidhuber. LSTM Recurrent Networks Learn Simple Context Free and Context Sensitive Languages. IEEE Transactions on Neural Networks 12(6):1333-1340, 2001. PDF. PS.GZ.

2. F. A. Gers and J. Schmidhuber and F. Cummins. Learning to Forget: Continual Prediction with LSTM. Neural Computation, 12(10):2451-2471, 2000. PDF. PS.GZ.

1. S. Hochreiter and J. Schmidhuber. Long Short-Term Memory. Neural Computation, 9(8):1735-1780, 1997. Based on TR FKI-207-95, TUM (1995). PDF. PS.GZ.


Compressed Network Search (1995-2013) can be used to find huge RNN controllers without a teacher, by evolving compact, compressed descriptions (programs) of large networks with over a million weights. Compare papers 38, 39, 44, 45, 47, 48 below.

The human brain is a recurrent neural network (RNN): a network of neurons with feedback connections. It can learn many behaviors / sequence processing tasks / algorithms / programs that are not learnable by traditional machine learning methods. This explains the rapidly growing interest in artificial RNNs for technical applications: general computers which can learn algorithms to map input sequences to output sequences, with or without a teacher. They are computationally more powerful and biologically more plausible than other adaptive approaches such as Hidden Markov Models (no continuous internal states), feedforward networks and Support Vector Machines (no internal states at all). Our recent applications include adaptive robotics and control, handwriting recognition, speech recognition, keyword spotting, music composition, attentive vision, protein analysis, stock market prediction, and many other sequence problems.

Early RNNs of the 1990s could not learn to look far back into the past. Their problems were first rigorously analyzed on Schmidhuber's RNN long time lag project by his former PhD student Hochreiter (1991). A feedback network called "Long Short-Term Memory" (LSTM, Neural Comp., 1997) overcomes the fundamental problems of traditional RNNs, and efficiently learns to solve many previously unlearnable tasks involving:

1. Recognition of temporally extended patterns in noisy input sequences
2. Recognition of the temporal order of widely separated events in noisy input streams
3. Extraction of information conveyed by the temporal distance between events
4. Stable generation of precisely timed rhythms, smooth and non-smooth periodic trajectories
5. Robust storage of high-precision real numbers across extended time intervals.

LSTM has transformed machine learning and Artificial Intelligence (AI), and is now available to billions of users through the world's four most valuable public companies: Apple (#1 as of March 31, 2017), Google (Alphabet, #2), Microsoft (#3), and Amazon (#4).

LSTM-controlled, knot-tying robot
Example from ref [19] below: LSTM-controlled multi-arm robot (above) uses Evolino to learn how to tie a knot (see next column, further down). The RNN's memory is necessary to deal with ambiguous sensory inputs from repetitively visited states.

Some benchmark records of 2013/2014 achieved with the help of LSTM RNNs, often at big IT companies:

1. Text-to-speech synthesis (Fan et al., Microsoft, Interspeech 2014)
2. Language identification (Gonzalez-Dominguez et al., Google, Interspeech 2014)
3. Large vocabulary speech recognition (Sak et al., Google, Interspeech 2014)
4. Prosody contour prediction (Fernandez et al., IBM, Interspeech 2014)
5. Medium vocabulary speech recognition (Geiger et al., Interspeech 2014)
6. English to French translation (Sutskever et al., Google, NIPS 2014)
7. Audio onset detection (Marchi et al., ICASSP 2014)
8. Social signal classification (Brueckner & Schulter, ICASSP 2014)
9. Arabic handwriting recognition (Bluche et al., DAS 2014)
10. TIMIT phoneme recognition (Graves et al., ICASSP 2013)
11. Optical character recognition (Breuel et al., ICDAR 2013)
12. Image caption generation (Vinyals et al., Google, 2014)
13. Video to textual description (Donahue et al., 2014)
14. Syntactic parsing for Natural Language Processing (Vinyals et al., Google, 2014)
15. Photo-real talking heads (Soong and Wang, Microsoft, 2014).

Also: end-to-end speech recognition (Hannun et al., Baidu, 2014) with our CTC-based RNNs (Graves et al., 2006), without any HMMs etc.

Many of the references above and more history can be found in:
J. Schmidhuber. Deep Learning in Neural Networks: An Overview. Neural Networks, 61: 85-117, 2015 (online 2014). (88 pages, 888 references, with PDF & LATEX source & complete public BIBTEX file; see also Google+ posts).

Today's LSTM algorithms were shaped by several theses of Schmidhuber's PhD students: Sepp Hochreiter (1999), Felix Gers (2001), Alex Graves (2008), Daan Wierstra (2010). More in the pipeline! Important contributions also came from postdocs including Fred Cummins, Santiago Fernandez, Faustino Gomez, and others.

LSTM recurrent neural network applications by (former) students & postdocs:

1. Recognition of connected handwriting: our LSTM RNN (trained by CTC) outperform all other known methods on the difficult problem of recognizing unsegmented cursive handwriting; in 2009 they won several handwriting recognition competitions (search the site for Schmidhuber's postdoc Alex Graves). In fact, this was the first RNN ever to win an official international pattern recognition contest. To our knowledge, it also was the first Very Deep Learner ever (recurrent or not) to win such a competition.

2. Speech recognition: Stacks of LSTM RNNs are also used for keyword spotting in speech (2007-2009, Santiago Fernandez). They also set the benchmark record on the famous TIMIT speech database (Graves et al, ICASSP 2013). Google used LSTM RNNs to improve large vocabulary speech recognition (Sak et al., Interspeech 2014) and machine translation (Sutskever et al., NIPS 2014).

3. Reinforcement learning robots in partially observable environments (Bram Bakker and Faustino Gomez, 2001-2006).

4. Metalearning of fast online learning algorithms; protein structure prediction (Sepp Hochreiter, 2001-05)

5. Music improvisation and music composition (Doug Eck, 2002-04)

6. More speech recognition, e.g., Fred Cummins, Nicole Beringer, Alex Graves, Santiago Fernandez, 2000-. For example, fast retraining on new data (impossible with HMMs).

7. Time series prediction through Evolino, with Daan Wierstra, Matteo Gagliolo, Faustino Gomez.

8. Recognition of regular / context free / context sensitive languages (Felix Gers, 2000)

(Hochreiter, Cummins, Eck, and Gers went on to become professors)

LSTM-controlled, knot-tying robot
Our impact on the world's most valuable public companies: Apple (#1), Alphabet (Google, #2), Microsoft (#3), Amazon (#5), ...
The RNN Book
NIPS 2016 Symposium on RNNs
Brainstorm Open Source Software for Neural Networks
Handwriting Recognition with Fast Deep Neural Nets & LSTM Recurrent Nets (Juergen Schmidhuber)
Deep Learning
Deep Learning in Neural Networks: an Overview
Computer Vision with Fast Deep Neural Nets Etc Yield Best Results on Many Visual Pattern Recognition Benchmarks
Who Invented Backpropagation?
My first Deep Learner of 1991 + Deep Learning timeline 1962-2013
RNN-Evolution
Evolution
Reinforcement Learning
Learning Robots
The formal theory of creativity by Juergen Schmidhuber explains the desire to learn motor skills, to do science, to produce art
Publications
Recipient of the 2016 IEEE CIS Neural Networks Pioneer Award (announced in 2015) for pioneering contributions to deep learning and neural networks.
Evolve RNNs: Evolino
page

RNN
Evo
page

Evo
main
page

RNN
book

RL
page

AI
page


Check out the NIPS 2003 RNN aissance workshop

A typical LSTM cell (right) is very simple. At its core there is a linear unit or neuron (orange). At any given time it just sums up the inputs that it sees via its incoming weighted connections. Its self- recurrent connection has a fixed weight of 1.0 (except when modulated - via the violet dot - through the left green unit which is not mandatory and which we may ignore for the moment). The 1.0 weight overcomes THE major problem of previous RNNs by making sure that training signals "from the future" cannot vanish as they are being "propagated back in time" (if this jargon does not make any sense to you, please consult some RNN papers, e.g., those below). Suffice it to say here that the simple linear unit is THE reason why LSTM nets can learn to discover the importance of events that happened 1000 discrete time steps ago, while previous RNNs already fail in case of time lags exceeding as few as 10 steps.
typical LSTM cell
LSTM networks consist of many connected LSTM cells such as this one. The LSTM learning algorithm is very efficient - not more than O(1) per time step and weight.
The linear unit lives in a cloud of nonlinear adaptive units needed for learning nonlinear behavior. Here we see an input unit (blue) and three (green) gate units; small violet dots are products. The gates learn to protect the linear unit from irrelevant input events and error signals.
Selected invited talks on Recurrent Networks etc:
Nov 2014: keynote for INNS-CIIS 2014
Nov 2014: plenary for ICONIP 2014
Aug 1-19: a dozen talks in New York and the Bay Area, some of them videotaped: Machine Learning meetup in the Empire State Building (youtube, vimeo), IBM Watson, Yahoo, SciHampton, Google Palo Alto (youtube), SciFoo @ Googleplex, Stanford University, Machine Learning meetup San Francisco (vimeo), ICSI (youtube), University of Berkeley. Talk slides.
Jun 2014: plenary for KAIST 2014, Korea
Feb 2014: ETHZ ML meetup - talk video (14 April) and slides
Sep 2013: CIG keynote, Niagara Falls, Canada
Dec 2012: Bionetics keynote
Oct 2012: IScIDE keynote, Nanjing, China
Sep 2011: IJCNN keynote, San Jose, CA
Sep 2010: Banquet talk for ECML / PKDD 2010, Barcelona
Oct 2009: EUCogII keynote, Hamburg
Oct 2009: Singularity Summit, NYC
Mar 2009: AGI keynote, Washington
Sep 2008: ICANN keynote, Prague
Jan 2008: Dagstuhl RNN meeting
Oct 2007: ALT 2007 & DS 2007, Sendai, Japan
Sept 12 2005: ICANN 2005 (plenary)
July 5 2005: Summer School NN2005: Porto, Portugal
June 18 2005: Neuro-IT Summer School, Venice
Nov 9 2004: Plenary talk, ANNIE 2004, St. Louis, US
July 12 2004: Summer School NN2004: Porto, Portugal
Feb 21 2004: Symposium on Human Language, Newcastle upon Tyne, UK


Theses on recurrent neural networks (in German):

2. J.  Schmidhuber. Netzwerk- architekturen, Zielfunktionen und Kettenregel (Network architectures, objective functions, and chain rule). Habilitation (postdoctoral thesis - qualification for a tenure professorship), Institut für Informatik, Technische Universität München, 1993 (496 K). PS.GZ. PDF. HTML.

1. J.  Schmidhuber. Dynamische neuronale Netze und das fundamentale raumzeitliche Lernproblem (Dynamic neural nets and the fundamental spatio-temporal credit assignment problem). Dissertation, Institut für Informatik, Technische Universität München, 1990. PS.GZ. PDF. HTML.

German home


Our Recurrent Support Vector Machines (recurrent SVMs) also use an LSTM feedback network architecture:

J. Schmidhuber, M. Gagliolo, D. Wierstra, F. Gomez. Evolino for Recurrent Support Vector Machines. TR IDSIA-19-05, v2, 15 Dec 2005. PDF. (Short version at ESANN 2006.) Full paper: Neural Computation 19(3): 757-779, 2007. PDF. Compare Evolino overview.

support vector machine graphics 
from the book of Cristianini & Shawe-Taylor
Additional Recurrent Network Journal Publications (not on LSTM):

10. R. K. Srivastava, B. Steunebrink, J. Schmidhuber. First Experiments with PowerPlay. Neural Networks, 2013. ArXiv preprint (2012): arXiv:1210.8385 [cs.AI].

9. F. Sehnke, C. Osendorfer, T. Rückstiess, A. Graves, J. Peters, J. Schmidhuber. Parameter-exploring policy gradients. Neural Networks 23(2), 2010. PDF.

8. T. Rückstiess, F. Sehnke, T. Schaul, D. Wierstra, S. Yi, J. Schmidhuber. Exploring Parameter Space in Reinforcement Learning. Paladyn Journal of Behavioral Robotics, 2010. PDF.

7. D. Wierstra, A. Förster, J. Peters, J. Schmidhuber. Recurrent Policy Gradients. Logic Journal of IGPL, 18:620-634, 2010 (doi:10.1093/jigpal/jzp049; advance access published 2009). PDF.

6. S. Hochreiter and J. Schmidhuber. Flat Minima. Neural Computation, 9(1):1-42, 1997. PS.GZ. HTML. (Has just a little bit on RNNs.)

5. J. Schmidhuber. Learning complex, extended sequences using the principle of history compression. Neural Computation, 4(2):234-242, 1992. PS.GZ. PDF. HTML.

4. J. Schmidhuber. A fixed size storage O(n^3) time complexity learning algorithm for fully recurrent continually running networks. Neural Computation, 4(2):243-248, 1992. PS.GZ. PDF. HTML.

3. J. Schmidhuber. Learning to control fast-weight memories: An alternative to recurrent nets. Neural Computation, 4(1):131-139, 1992. PS.GZ. PDF. HTML. Pictures (German).

2. J. Schmidhuber and R. Huber. Learning to generate artificial fovea trajectories for target detection. International Journal of Neural Systems, 2(1 & 2):135-141, 1991 (figures omitted!). PS.GZ. PDF . HTML. HTML overview with pictures.

1. J. Schmidhuber. A local learning algorithm for dynamic feedforward and recurrent networks. Connection Science, 1(4):403-412, 1989. (The Neural Bucket Brigade - figures omitted!). PS.GZ. PDF. HTML.

Selected conference publications on LSTM and other RNNs / feedback networks:

52. M. Stollenga, W. Beyon, M. Liwicki, J. Schmidhuber. Parallel Multi-Dimensional LSTM, With Application to Fast Biomedical Volumetric Image Segmentation. Advances in Neural Information Processing Systems (NIPS), 2015, in press. Preprint: arxiv:1506.07452.

51. K. Greff, R. K. Srivastava, J. Schmidhuber. Training Very Deep Networks. Advances in Neural Information Processing Systems (NIPS), 2015, in press. Preprint: arxiv:1505.00387.

50. J. Koutnik, K. Greff, F. Gomez, J. Schmidhuber. A Clockwork RNN. Proc. 31st International Conference on Machine Learning (ICML), p. 1845-1853, Beijing, 2014. Preprint arXiv:1402.3511 [cs.NE].

49. M. Stollenga, J.Masci, F. Gomez, J. Schmidhuber. Deep Networks with Internal Selective Attention through Feedback Connections. Preprint arXiv:1407.3068 [cs.CV]. Advances in Neural Information Processing Systems (NIPS), 2014.

48. J. Koutnik, G. Cuccu, J. Schmidhuber, F. Gomez. Evolving Large-Scale Neural Networks for Vision-Based Reinforcement Learning. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), Amsterdam, 2013. PDF.

47. J. Koutnik, G. Cuccu, J. Schmidhuber, F. Gomez. Evolving Large-Scale Neural Networks for Vision-Based TORCS. In Foundations of Digital Games (FDG), Chania, Crete, 2013. PDF.

46. R. K. Srivastava, B. Steunebrink, M. Stollenga, J. Schmidhuber Continually Adding Self-Invented Problems to the Repertoire: First Experiments with PowerPlay. Proc. IEEE Conference on Development and Learning / EpiRob 2012 (ICDL-EpiRob'12), San Diego, 2012. PDF.

45. F. Gomez, J. Koutnik, J. Schmidhuber. Compressed Network Complexity Search. In C. Coello Coello, V. Cutello, K. Deb, S. Forrest, G. Nicosia, M. Pavone, eds., 12th Int. Conf. on Parallel Problem Solving from Nature - PPSN XII, Taormina, 2012. Nominated for best paper award. PDF.

44. R. K. Srivastava, F. Gomez, J. Schmidhuber. Generalized Compressed Network Search. In C. Coello Coello, V. Cutello, K. Deb, S. Forrest, G. Nicosia, M. Pavone, eds., 12th Int. Conf. on Parallel Problem Solving from Nature - PPSN XII, Taormina, 2012. PDF.

43. M. Ring, T. Schaul, J. Schmidhuber. The Two-Dimensional Organization of Behavior. In Proc. Joint IEEE International Conference on Development and Learning (ICDL) and on Epigenetic Robotics (ICDL-EpiRob 2011), Frankfurt, 2011. PDF.

42. J. Schmidhuber, D. Ciresan, U. Meier, J. Masci, A. Graves. On Fast Deep Nets for AGI Vision. In Proc. Fourth Conference on Artificial General Intelligence (AGI-11), Google, Mountain View, California, 2011. PDF.

41. L. Gisslen, M. Luciw, V. Graziano, J. Schmidhuber. Sequential Constant Size Compressors and Reinforcement Learning. In Proc. Fourth Conference on Artificial General Intelligence (AGI-11), Google, Mountain View, California, 2011. PDF. Kurzweil Prize for Best AGI Paper 2011.

40. T. Glasmachers, T. Schaul, Sun Yi, D. Wierstra, J. Schmidhuber. Exponential Natural Evolution Strategies. Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2010), Portland, 2010. PDF. GECCO 2010 best paper nomination.

39. J. Koutnik, F. Gomez, J. Schmidhuber (2010). Evolving Neural Networks in Compressed Weight Space. Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2010), Portland, 2010. PDF.

38. J. Koutnik, F. Gomez, J. Schmidhuber. Searching for Minimal Neural Networks in Fourier Space. The 3rd Conference on Artificial General Intelligence (AGI-10), 2010. PDF.

37. M. Grüttner, F. Sehnke, T. Schaul, J. Schmidhuber. Multi-Dimensional Deep Memory Go-Player for Parameter Exploring Policy Gradients. Proceedings of the International Conference on Artificial Neural Networks (ICANN-2010), Greece, 2010.

36. A. Graves, J. Schmidhuber. Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks. Advances in Neural Information Processing Systems 22, NIPS'22, p 545-552, Vancouver, MIT Press, 2009. PDF.

35. J. Bayer, D. Wierstra, J. Togelius, J. Schmidhuber. Evolving memory cell structures for sequence learning. Proceedings of the 19th International Conference on Artificial Neural Networks (ICANN-09), Cyprus, 2009. PDF.

34. J. Unkelbach, S. Yi, J. Schmidhuber. An EM based training algorithm for recurrent neural networks. Proceedings of the 19th International Conference on Artificial Neural Networks (ICANN-09), Cyprus, 2009. PDF.

33. A. Graves, S. Fernandez,M. Liwicki, H. Bunke, J. Schmidhuber. Unconstrained online handwriting recognition with recurrent neural networks. Advances in Neural Information Processing Systems 21, NIPS'21, p 577-584, 2008, MIT Press, Cambridge, MA, 2008. PDF.

32. T. Rückstiess, M. Felder, J. Schmidhuber. State-Dependent Exploration for Policy Gradient Methods. 19th European Conference on Machine Learning ECML, 2008. PDF.

31. T. Schaul and J. Schmidhuber. A Scalable Neural Network Architecture for Board Games. Proceedings of the 2008 IEEE Symposium on Computational Intelligence in Games CIG-2008, Perth, Australia, 2008, in press. PDF.

30. F. Sehnke, C. Osendorfer, T. Rückstiess, A. Graves, J. Peters, and J. Schmidhuber. Policy gradients with parameter-based exploration for control. In J. Koutnik V. Kurkova, R. Neruda, editors, Proceedings of the International Conference on Artificial Neural Networks ICANN-2008 ICANN 2008, Prague, LNCS 5163, pages 387-396. Springer-Verlag Berlin Heidelberg, 2008. PDF.

29. D. Wierstra, T. Schaul, J. Peters, J. Schmidhuber. Fitness Expectation Maximization. Proceedings of Parallel Problem Solving from Nature PPSN-2008, Dortmund, 2008. PDF.

28. D. Wierstra, T. Schaul, J. Peters, J. Schmidhuber. Natural Evolution Strategies. Proceedings of IEEE Congress on Evolutionary Computation CEC-2008, Hongkong, 2008. PDF.

27. D. Wierstra, J. Schmidhuber. Policy Gradient Critics. 18th European Conference on Machine Learning ECML, Warszaw, 2007. PDF.

26. M. Liwicki, A. Graves, H. Bunke, J. Schmidhuber. A novel approach to on-line handwriting recognition based on bidirectional long short-term memory networks. 9th International Conference on Document Analysis and Recognition, 2007. PDF.

25. S. Fernandez, A. Graves, J. Schmidhuber. An application of recurrent neural networks to discriminative keyword spotting. Intl. Conf. on Artificial Neural Networks ICANN'07, 2007. PDF.

24. A. Graves, S. Fernandez, J. Schmidhuber. Multi-Dimensional Recurrent Neural Networks. Intl. Conf. on Artificial Neural Networks ICANN'07, 2007. Preprint: arxiv:0705.2011. PDF.

23. D. Wierstra, A. Foerster, J. Schmidhuber. Solving Deep Memory POMDPs with Recurrent Policy Gradients. Intl. Conf. on Artificial Neural Networks ICANN'07, 2007.

22. A. Foerster, A. Graves, J. Schmidhuber. RNN-based Learning of Compact Maps for Efficient Robot Localization. 15th European Symposium on Artificial Neural Networks, ESANN, Bruges, Belgium, 2007 PDF.

21. S. Fernandez, A. Graves, J. Schmidhuber. Sequence labelling in structured domains with hierarchical recurrent neural networks. In Proc. 20th International Joint Conference on Artificial Intelligence (IJCAI 07), p. 774-779, Hyderabad, India, 2007 (talk). PDF.

20. F. Gomez, J. Schmidhuber, and R. Miikkulainen (2006). Efficient Non-Linear Control through Neuroevolution. Proceedings of the European Conference on Machine Learning (ECML-06, Berlin). PDF.

19. H. Mayer, F. Gomez, D. Wierstra, I. Nagy, A. Knoll, and J. Schmidhuber (2006). A System for Robotic Heart Surgery that Learns to Tie Knots Using Recurrent Neural Networks. Proceedings of the International Conference on Intelligent Robotics and Systems (IROS-06, Beijing). PDF.

18. A. Graves, S. Fernandez, F. Gomez, J. Schmidhuber. Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks. Proceedings of the International Conference on Machine Learning (ICML-06, Pittsburgh), 2006. PDF.

17. J. Schmidhuber and D. Wierstra and F. J. Gomez. Hybrid Neuroevolution/Regression Search for Sequence Prediction. Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI), 2005. PDF.

16. D. Wierstra and F. Gomez and J. Schmidhuber. Modeling systems with internal state using Evolino. In Proc. of the 2005 conference on genetic and evolutionary computation (GECCO), Washington, D. C., pp. 1795-1802, ACM Press, New York, NY, USA, 2005. (Got a GECCO best paper award). PDF.

15. F. Gomez and J. Schmidhuber. Co-evolving recurrent neurons learn deep memory POMDPs. In Proc. of the 2005 conference on genetic and evolutionary computation (GECCO), Washington, D. C., pp. 1795-1802, ACM Press, New York, NY, USA, 2005. (Nominated for a best paper award). PDF.

14. F. J. Gomez and J. Schmidhuber. Evolving modular fast-weight networks for control. In W. Duch et al. (Eds.): Proc. Intl. Conf. on Artificial Neural Networks ICANN'05, LNCS 3697, pp. 383-389, Springer-Verlag Berlin Heidelberg, 2005. Featuring a 3-wheeled reinforcement learning robot with distance sensors that learns without a teacher to balance a jointed pole indefinitely in a confined 3D environment. PDF.

13. A. Graves, S. Fernandez, and J. Schmidhuber. Bidirectional LSTM Networks for Improved Phoneme Classification and Recognition. In W. Duch et al. (Eds.): Proc. Intl. Conf. on Artificial Neural Networks ICANN'05, LNCS 3697, pp. 799-804, Springer-Verlag Berlin Heidelberg, 2005. PDF.

12. N. Beringer and A. Graves and F. Schiel and J. Schmidhuber. Classifying unprompted speech by retraining LSTM Nets. In W. Duch et al. (Eds.): Proc. Intl. Conf. on Artificial Neural Networks ICANN'05, LNCS 3696, pp. 575-581, Springer-Verlag Berlin Heidelberg, 2005. PDF.

11. A. Graves and J. Schmidhuber. Framewise Phoneme Classification with Bidirectional LSTM Networks. In Proc. International Joint Conference on Neural Networks IJCNN'05, 2005. PDF.

10. A. Graves, D. Eck and N. Beringer, J. Schmidhuber. Biologically Plausible Speech Recognition with LSTM Neural Nets. In J. Ijspeert (Ed.), First Intl. Workshop on Biologically Inspired Approaches to Advanced Information Technology, Bio-ADIT 2004, Lausanne, Switzerland, p. 175-184, 2004. PDF .

9. A. Graves, N. Beringer, J. Schmidhuber. A Comparison Between Spiking and Differentiable Recurrent Neural Networks on Spoken Digit Recognition. In Proc. 23rd International Conference on modelling, identification, and control (IASTED), 2004. PDF .

8. B. Bakker, V. Zhumatiy, G. Gruener, and J. Schmidhuber. A Robot that Reinforcement-Learns to Identify and Memorize Important Previous Observations (PDF). In Proceedings of the 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS2003, 2003.

7. B. Bakker and J. Schmidhuber. Hierarchical Reinforcement Learning Based on Subgoal Discovery and Subpolicy Specialization (PDF). In F. Groen, N. Amato, A. Bonarini, E. Yoshida, and B. Kröse (Eds.), Proceedings of the 8-th Conference on Intelligent Autonomous Systems, IAS-8, Amsterdam, The Netherlands, p. 438-445, 2004.

6. B. Bakker, F. Linaker, J. Schmidhuber. Reinforcement Learning in Partially Observable Mobile Robot Domains Using Unsupervised Event Extraction. In Proceedings of the 2002 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2002), Lausanne, 2002. PDF .

5. B. Bakker. Reinforcement Learning with Long Short-Term Memory. Advances in Neural Information Processing Systems 13 (NIPS'13), 2002. (On J. Schmidhuber's CSEM grant 2002.)

4. D. Eck and J. Schmidhuber. Learning The Long-Term Structure of the Blues. In J. Dorronsoro, ed., Proceedings of Int. Conf. on Artificial Neural Networks ICANN'02, Madrid, pages 284-289, Springer, Berlin, 2002. PDF.

3. M. Klapper-Rybicka, N. N. Schraudolph, J. Schmidhuber. Unsupervised Learning in LSTM Recurrent Neural Networks. In G. Dorffner, H. Bischof, K. Hornik, eds., Proceedings of Int. Conf. on Artificial Neural Networks ICANN'01, Vienna, LNCS 2130, pages 684-691, Springer, 2001. PDF.

2. S. Hochreiter and J. Schmidhuber. LSTM can solve hard long time lag problems. In M. C. Mozer, M. I. Jordan, T. Petsche, eds., Advances in Neural Information Processing Systems 9, NIPS'9, pages 473-479, MIT Press, Cambridge MA, 1997. PDF . HTML.

1. S. Hochreiter and J. Schmidhuber. Bridging long time lags by weight guessing and ``Long Short-Term Memory''. In F. L. Silva, J. C. Principe, L. B. Almeida, eds., Frontiers in Artificial Intelligence and Applications, Volume 37, pages 65-72, IOS Press, Amsterdam, Netherlands, 1996.

videos of talks on deep learning in the US
Evolino for time series prediction
2011: First Superhuman Visual Pattern Recognition
Pybrain Machine Learning Library for Robot Learning
Artificial Music Composition
Attentive vision
Fast Weights

Subgoal learning
Subgoal learning with RNNs

Additional Recurrent Network Book Chapters:

2. J.  Schmidhuber, S. Hochreiter, Y. Bengio. Evaluating benchmark problems by random guessing. In S. C. Kremer and J. F. Kolen, eds., A Field Guide to Dynamical Recurrent Neural Networks. IEEE press, 2001. PDF . HTML.

1. S. Hochreiter, Y. Bengio, P. Frasconi, J.  Schmidhuber. Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In S. C. Kremer and J. F. Kolen, eds., A Field Guide to Dynamical Recurrent Neural Networks. IEEE press, 2001. PDF . HTML.

Fibonacci web design
by J. Schmidhuber

LSTM states as the 
network hears the word: seventy-two
LSTM compositions: chords
Top: states of a speech- processing LSTM network that hears the word "seventy-two", e.g., ref [10] above.
Bottom: LSTM music compositions: chords from ref [4] above.


More recurrent neural network conference publications (additional RNN publications can be found in Schmidhuber's full publication list):

13. J.  Schmidhuber. A self-referential weight matrix. In Proceedings of the International Conference on Artificial Neural Networks, Amsterdam, pages 446-451. Springer, 1993. PDF . HTML.

12. J.  Schmidhuber. Reducing the ratio between learning complexity and number of time-varying variables in fully recurrent nets. In Proceedings of the International Conference on Artificial Neural Networks, Amsterdam, pages 460-463. Springer, 1993. PDF. HTML.

11. J.  Schmidhuber, M. C. Mozer, and D. Prelinger. Continuous history compression. In H. Hüning, S. Neuhauser, M. Raus, and W. Ritschel, editors, Proc. of Intl. Workshop on Neural Networks, RWTH Aachen, pages 87-95. Augustinus, 1993.

10. J.  Schmidhuber. Learning unambiguous reduced sequence descriptions. In J. E. Moody, S. J. Hanson, and R. P. Lippman, editors, Advances in Neural Information Processing Systems 4, NIPS'4, pages 291-298. San Mateo, CA: Morgan Kaufmann, 1992. PDF . HTML.

9. J.  Schmidhuber. Reinforcement learning in Markovian and non-Markovian environments. In D. S. Lippman, J. E. Moody, and D. S. Touretzky, editors, Advances in Neural Information Processing Systems 3, NIPS'3, pages 500-506. San Mateo, CA: Morgan Kaufmann, 1991. PDF . HTML.

8. J.  Schmidhuber. Adaptive decomposition of time. In T. Kohonen, K. Mäkisara, O. Simula, and J. Kangas, editors, Artificial Neural Networks, pages 909-914. Elsevier Science Publishers B.V., North-Holland, 1991.

7. J.  Schmidhuber. A possibility for implementing curiosity and boredom in model-building neural controllers. In J. A. Meyer and S. W. Wilson, editors, Proc. of the International Conference on Simulation of Adaptive Behavior: From Animals to Animats, pages 222-227. MIT Press/Bradford Books, 1991. PDF . HTML.

6. J.  Schmidhuber. Learning algorithms for networks with internal and external feedback. In D. S. Touretzky, J. L. Elman, T. J. Sejnowski, and G. E. Hinton, editors, Proc. of the 1990 Connectionist Models Summer School, pages 52-61. San Mateo, CA: Morgan Kaufmann, 1990.

5. J.  Schmidhuber. An on-line algorithm for dynamic reinforcement learning and planning in reactive environments. In Proc. IEEE/INNS International Joint Conference on Neural Networks, San Diego, volume 2, pages 253-258, 1990.

4. J.  Schmidhuber. Making the world differentiable: On using fully recurrent self-supervised neural networks for dynamic reinforcement learning and planning in non-stationary environments. Technical Report FKI-126-90, Institut für Informatik, Technische Universität München, February 1990 (revised in November).

3. J.  Schmidhuber. Reinforcement learning with interacting continually running fully recurrent networks. In Proc. INNC International Neural Network Conference, Paris, volume 2, pages 817-820, 1990.

2. J.  Schmidhuber. Recurrent networks adjusted by adaptive critics. In Proc. IEEE/INNS International Joint Conference on Neural Networks, Washington, D. C., volume 1, pages 719-722, 1990.

1. J.  Schmidhuber. The neural bucket brigade. In R. Pfeifer, Z. Schreter, Z. Fogelman, and L. Steels, editors, Connectionism in Perspective, pages 439-446. Amsterdam: Elsevier, North-Holland, 1989.

Attentive vision
Early work on vision with RNNs