next up previous
Next: Time constants Up: Gradient Flow in Recurrent Previous: Dilemma: Avoiding gradient decay


The above theoretical investigations indicate a basic limitation of gradient descent as a search procedure for finding optimal weights in a RNN. Several proposals have been made to cope with the problem of long-term dependencies, some attempting to solve the optimization problem using alternative search algorithms, other trying to devise alternative architectures. In the following we give a brief accounts of these proposals.


Juergen Schmidhuber 2003-02-19