BOOK ON RECURRENT NEURAL NETWORKS - SEQUENCE LEARNING - PROGRAM LEARNING - FEEDBACK NETWORKS - FEEDBACK NEURAL NETWORK - RECURRENT NETS

Last
update
2015.
Click on
RNN-
related
squares.

This is the preliminary web site on the upcoming Book on Recurrent Neural Networks, to be published by Cambridge University Press. The authors are:
Jürgen Schmidhuber
Alex Graves
Faustino Gomez
Sepp Hochreiter

We hope it will become the definitive textbook on RNN for sequence processing & program learning.

Some of our previous work on RNN (1989-2010) can be found in the recurrent net page or by clicking at the square icons to the right. Our Open Source RNN & LSTM Software Librairies: Brainstorm; RNNLIB; Pybrain.

.

Blurb (from the RNN page): The human brain is a recurrent neural net (RNN): a network of neurons with feedback connections. It can learn many behaviors / sequence processing tasks / algorithms / programs that are not learnable by traditional machine learning methods. These capabilities explain the rapidly growing interest in artificial RNN for technical applications: general computers which can learn algorithms to map input sequences to output sequences, with or without a teacher. They are computationally more powerful and biologically more plausible than other adaptive approaches such as Hidden Markov Models (no continuous internal states), feedforward networks and Support Vector Machines (no internal states at all). RNN have recently given state-of-the-art results in time series prediction, adaptive robotics and control, connected handwriting recognition, image classification, speech recognition, protein analysis, stock market prediction, and other sequence learning problems.
.

Request for references.
Quite a few references to previous work on RNN can already be found in this incomplete and unordered bibfile; even more can be found in this bigger bibfile, but many important citations are still missing. To make sure we give a comprehensive overview of the field, we ask those who have contributed to it to send additional relevant references, preferably bibtex, to juergen@idsia.ch. For each of your papers, could you please also send at most five lines stating why you feel this was novel back then, possibly tinged by the benefit of hindsight? Thanks a lot!
.

Sorry for the delay! Unfortunately, the RNN book is a bit delayed because the field is moving so rapidly. However, the Deep Learning Overview (Schmidhuber, 2015) is also an RNN survey. Useful algorithms for supervised, unsupervised, and reinforcement learning RNNs are mentioned in Sec. 5.5, 5.5.1, 5.6.1, 5.9, 5.10, 5.13, 5.16, 5.17, 5.20, 5.22, 6.1, 6.3, 6.4, 6.6, 6.7.

PREFACE (January 2011). Soon after the birth of modern computer science around the 1930s, two fundamental questions arose: 1. How can computers learn useful programs from experience, as opposed to being programmed by human programmers? 2. How to program parallel multiprocessor machines, as opposed to traditional serial architectures? Both questions have triggered enormous research efforts over the past 70 years, yet are more pressing than ever. The simple architectures and algorithms in this book will tackle them, trying to catch two big birds with one stone.

To build efficient adaptive problem solvers for tasks ranging from robot control to prediction and sequential pattern recognition, we will investigate the highly promising concept of artificial recurrent neural networks, or simply RNN. They allow for both parallel and sequential computation, and in principle can compute anything a traditional computer can compute. Unlike traditional computers, however, RNN are similar to the human brain, which is a large feedback network of connected neurons that somehow can learn to translate a lifelong sensory input stream into a sequence of useful motor outputs. The brain is a remarkable role model as it can solve many problems current machines cannot yet solve.

Our goal is not to build detailed brain models though. We leave this task to neuroscientists. Some of them tend to focus on wetware details such as individual neurons and synapses, akin to electrical engineers focusing on hardware details such as characteristic curves of transistors, although the transistor's main raison d'etre is its value as a simple binary switch. Others study large scale phenomena such as brain region activity during thought, akin to physicists monitoring the time-varying heat distribution of a microprocessor, possibly without realizing the simple nature of a quicksort program running on it.

This book will adopt the algorithmic point of view instead. We will reduce operations of computational nodes and connections to their essentials, stripping them bare of all wetware-specific or hardware-specific features that are not shown to be relevant for problem solving, and ask: How can networks of such simple nodes learn parallel-sequential programs solving complex tasks, with or without a teacher? Although the answers may inspire future research on both brains and microprocessors, the language to discuss such questions is not the one of neurophysiology, electrical engineering, or physics, but the abstract language of mathematics and algorithms, in particular, machine learning.

Most traditional machine learning methods, however, are much more limited than RNN. In particular, unlike the popular artificial feedforward neural networks (FNN) and Support Vector Machines (SVM), RNN can not only deal with stationary input and output patterns but also with pattern sequences of arbitrary length. In fact, while FNN and SVM have been extremely successful in restricted applications, they assume that all their inputs are stationary and independent of each other. In the real world this is unrealistic: normally past events influence future events. A temporary memory of things that happened a while ago may be essential for producing a useful output action later. RNN can implement arbitrary types of such short-term memories or internal states by means of their recurrent connections; FNN and SVM cannot. In fact, RNN can implement real sequence-processing and sequence-producing programs with loops and temporary variables, while FNN and SVM are limited to simple feedforward mappings from inputs to outputs. Therefore RNN can solve many tasks unsolvable by FNN and SVM. RNN are also more powerful than widely used probabilistic sequence processors such as Hidden Markov Models (HMM), which are unable to compactly encode complex memories of previous events. And unlike traditional methods for automatic sequential program synthesis, RNN can learn programs that mix sequential and parallel information processing in a natural and efficient way, exploiting the massive parallelism viewed as crucial for sustaining the rapid decline of computation cost observed over the past 70 years.

RNN research dates back at least to the 1980s. Various teething troubles, however, prevented the first RNN from outperforming less general approaches in all but toy applications. Recent progress has overcome the initial difficulties and dramatically changed the picture. In the new millennium, RNN have for the first time given impressive state-of-the-art results in diverse fields such as complex time series prediction, adaptive robotics and control, connected handwriting recognition, image classification, aspects of speech recognition, protein analysis, and other sequence learning problems, with no end in sight. This explains the rapidly growing interest in RNN for technical applications, and builds a major motivation for this book, which fills a gap as for many years there was no compact description of the state of the art in the field, which was distributed over numerous individual articles, many of them from our labs. (Good textbooks on machine learning, such as Bishop's "Pattern Recognition'', do not have serious chapters on general sequence learning and RNN.)

Our potential readership includes researchers and students in the fields of pattern recognition, sequence processing, time series analysis, computer vision, robotics, bioinformatics, financial market prediction, the learning of programs as opposed to traditional static input-output mappings, and machine learning / problem solving in general. The book is self-contained and does not assume any prior knowledge except elementary mathematics. For example, no prior knowledge of neural networks is required. Other sequence processors such as HMM will be explained where necessary. All algorithms will be derived from first principles. A glossary at the end of the book compactly summarizes relevant concepts of statistics, analysis, linear algebra, and algorithmic information theory. The book is suitable for specialized courses on program learning, or as supporting material for a general course on machine learning. Enjoy!
.

Handwriting Recognition with Fast Deep Neural Nets & LSTM Recurrent Nets (Juergen Schmidhuber)

The formal theory of creativity by Juergen Schmidhuber explains the desire to learn motor skills, to do science, to produce art

Computer Vision with Fast Deep Neural Nets Etc Yield Best Results on Many Visual Pattern Recognition Benchmarks