Next: Bibliography Up: Gradient Flow in Recurrent Previous: Long Short-Term Memory

Conclusions

In principle, RNNs are the most general and powerful current sequence learning method. For instance, unlike Hidden Markov Models (HMMs, the most successful technique in several sequence processing applications - see [3] for a review) they are not limited to discrete internal states but allow for continuous, distributed sequence representations. Hence they can solve tasks no other current method can solve (e.g., [10]). The problem of vanishing gradients, however, makes conventional RNNs hard to train. We suspect this is why feedforward neural networks outnumber RNNs in terms of successful real-world applications. Some of the remedies outlined in this chapter may lead to more effective learning systems. However, long lime lag research still seems to be in an early stage -- no commercial applications of any of these methods have been reported so far. Long time lags pose problems to any soft computing method, not just RNNs. For instance, when dealing with long sequences (e.g., speech or biological data), HMMs mostly rely on a localized representation of time by means of highly constrained non ergodic transition diagrams (different states are designed for different portions of a sequence). Belief propagation over long time lags does not effectively occur, a phenomenon called diffusion of credit [5], which closely resembles the vanishing gradients problem in RNNs.

Next: Bibliography Up: Gradient Flow in Recurrent Previous: Long Short-Term Memory

Juergen Schmidhuber 2003-02-19