next up previous
Next: THE ALGORITHM Up: A FIXED SIZE STORAGE Previous: A FIXED SIZE STORAGE

INTRODUCTION

There are two basic methods for performing steepest descent in fully recurrent networks with $n$ non-input units and $m = O(n)$ input units. `Back propagation through time' (BPTT) (e.g. [Williams and Peng, 1990]) requires potentially unlimited storage in proportion to the length of the longest training sequence but needs only $O(n^2)$ computations per time step. BPTT is the method of choice if training sequences are known to have less than $O(n)$ time steps. For training sequences involving many more time steps than $n$, for training sequences of unknown length, and for on-line learning in general one would like to have an algorithm with upper bounds for storage and for computations required per time step. Such an algorithm is the RTRL-algorithm [Robinson and Fallside, 1987][Williams and Zipser, 1989]. It requires only fixed-size storage of the order $O(n^3)$ but is computationally expensive: It requires $O(n^4)$ operations per time step3. The algorithm described herein requires $O(n^3)$ storage, too. Every $O(n)$ time steps it requires $O(n^4)$ operations, but on all other time steps it requires only $O(n^2)$ operations. This cuts the average time complexity per time step to $O(n^3)$.



Juergen Schmidhuber 2003-02-13

Back to Recurrent Neural Networks page