Next: Bibliography
Up: Gradient Flow in Recurrent
Previous: Long Short-Term Memory
In principle, RNNs are the most general and powerful current sequence
learning method. For instance, unlike Hidden Markov Models (HMMs, the most
successful technique in several sequence processing applications - see
[3] for a review) they are not limited to discrete internal
states but allow for continuous, distributed sequence representations.
Hence they can solve tasks no other current method can solve (e.g.,
[10]). The problem of vanishing gradients, however, makes
conventional RNNs hard to train. We suspect this is why feedforward
neural networks outnumber RNNs in terms of successful real-world
applications. Some of the remedies outlined in this chapter may lead to
more effective learning systems. However, long lime lag research still
seems to be in an early stage -- no commercial applications of any of
these methods have been reported so far.
Long time lags pose problems to any soft computing method, not just
RNNs. For instance, when dealing with long sequences (e.g., speech or
biological data), HMMs mostly rely on a localized representation
of time by means of highly constrained non ergodic transition
diagrams (different states are designed for different portions of a
sequence). Belief propagation over long time lags does not effectively
occur, a phenomenon called diffusion of credit [5], which
closely resembles the vanishing gradients problem in RNNs.
Next: Bibliography
Up: Gradient Flow in Recurrent
Previous: Long Short-Term Memory
Juergen Schmidhuber
2003-02-19