Sorry for the delay!
Unfortunately, the RNN book is a bit delayed because the field is moving so rapidly. However, the
Deep Learning Overview (Schmidhuber, 2015) is also an RNN survey. Useful algorithms for supervised, unsupervised, and reinforcement learning RNNs are mentioned in Sec. 5.5, 5.5.1, 5.6.1, 5.9, 5.10, 5.13, 5.16, 5.17, 5.20, 5.22, 6.1, 6.3, 6.4, 6.6, 6.7.
PREFACE (January 2011).
Soon after the birth of modern computer science around the 1930s,
two fundamental questions arose: 1. How can computers learn useful
programs from experience, as opposed to being programmed by human
programmers? 2. How to program parallel multiprocessor machines,
as opposed to traditional serial architectures? Both questions have
triggered enormous research efforts over the past 70 years, yet are
more pressing than ever. The simple architectures and algorithms
in this book will tackle them, trying to catch two big birds with
one stone.
To build efficient adaptive problem solvers for
tasks ranging from robot control to prediction
and sequential pattern recognition, we will investigate the
highly promising concept of artificial recurrent
neural networks, or simply RNN. They allow for both parallel and
sequential computation, and in principle can compute anything a
traditional computer can compute. Unlike traditional computers,
however, RNN are similar to the human brain, which is a large
feedback network of connected neurons that somehow can learn to
translate a lifelong sensory input stream into a sequence of useful
motor outputs. The brain is a remarkable role model as it can solve
many problems current machines cannot yet solve.
Our goal is not to build detailed brain models though. We leave
this task to neuroscientists. Some of them tend to focus on wetware
details such as individual neurons and synapses, akin to electrical
engineers focusing on hardware details such as characteristic curves
of transistors, although the transistor's main raison d'etre is its
value as a simple binary switch. Others study large scale phenomena
such as brain region activity during thought, akin to physicists
monitoring the time-varying heat distribution of a microprocessor,
possibly without realizing the simple nature of a quicksort program
running on it.
This book will adopt the algorithmic point of view instead. We will
reduce operations of computational nodes and connections to their
essentials, stripping them bare of all wetware-specific or
hardware-specific features that are not shown to be relevant for
problem solving, and ask: How can networks of such simple
nodes learn parallel-sequential programs solving complex tasks,
with or without a teacher? Although the answers may inspire future
research on both brains and microprocessors, the language to discuss
such questions is not the one of neurophysiology, electrical
engineering, or physics, but the abstract language of mathematics
and algorithms, in particular, machine learning.
Most traditional machine learning methods, however, are much more
limited than RNN. In particular, unlike the popular artificial feedforward
neural networks (FNN) and Support Vector Machines (SVM), RNN can
not only deal with stationary input and output patterns but also
with pattern sequences of arbitrary length. In fact, while FNN and
SVM have been extremely successful in restricted applications, they
assume that all their inputs are stationary and independent of each
other. In the real world this is unrealistic: normally past events
influence future events. A temporary memory of things that happened
a while ago may be essential for producing a useful output action
later. RNN can implement arbitrary types of such short-term memories
or internal states by means of their recurrent connections; FNN and
SVM cannot. In fact, RNN can implement real sequence-processing and
sequence-producing programs with loops and temporary variables,
while FNN and SVM are limited to simple feedforward mappings from
inputs to outputs. Therefore RNN can solve many tasks unsolvable
by FNN and SVM. RNN are also more
powerful than widely used probabilistic sequence processors such
as Hidden Markov Models (HMM), which are unable to compactly encode
complex memories of previous events. And unlike traditional methods
for automatic sequential program synthesis, RNN can learn programs
that mix sequential and parallel information processing in a natural
and efficient way, exploiting the massive parallelism viewed as
crucial for sustaining the rapid decline of computation cost observed
over the past 70 years.
RNN research dates back at least to
the 1980s. Various teething troubles, however, prevented the first
RNN from outperforming less general approaches in all but toy
applications. Recent progress has overcome the initial difficulties
and dramatically changed the picture. In the new millennium, RNN
have for the first time given impressive state-of-the-art results
in diverse fields such as complex time series prediction, adaptive
robotics and control, connected handwriting recognition, image
classification, aspects of speech recognition, protein analysis,
and other sequence learning problems, with no end in sight. This
explains the rapidly growing interest in RNN for technical applications,
and builds a major motivation for this book, which fills a gap
as for many years there was no compact description of
the state of the art in the field, which was distributed
over numerous individual articles, many of them from our labs.
(Good textbooks on machine learning, such as Bishop's "Pattern Recognition'', do
not have serious chapters on general sequence learning and RNN.)
Our potential readership
includes researchers and students in the
fields of pattern recognition,
sequence processing, time series analysis, computer vision, robotics,
bioinformatics, financial market prediction, the
learning of programs as opposed to traditional static input-output
mappings, and machine learning / problem solving in general.
The book is self-contained and does not assume any prior knowledge
except elementary mathematics. For example, no prior
knowledge of neural networks is
required. Other sequence processors such as HMM will be explained where
necessary. All algorithms will be derived from first principles.
A glossary at the end of the book compactly summarizes relevant
concepts of statistics, analysis, linear algebra, and algorithmic information theory.
The book is suitable for specialized
courses on program learning, or as supporting material for a general
course on machine learning. Enjoy!
.
|