ABSTRACT.
Let be the number of time-varying variables for storing temporal
events in a fully recurrent sequence processing network. Let
be the ratio between the number of operations per time step
(for an exact gradient based supervised
sequence learning algorithm), and
. Let
be the ratio between
the maximum number of storage cells necessary for learning arbitrary
sequences, and
. With conventional recurrent nets,
equals the
number of units. With the popular `real time recurrent learning
algorithm' (RTRL),
and
. With `back-propagation
through time' (BPTT),
(much
better than with RTRL) and
is infinite (much worse than with RTRL). The contribution
of this paper is a novel
fully recurrent network and a corresponding exact gradient based
learning algorithm with
(as good as with BPTT)
and
(as good as with RTRL).