Following the argumentation in [Williams and Peng, 1990], continuous time versions of BPTT and RTRL [Pearlmutter, 1989] [Gherrity, 1989] can serve as a basis for a correspondingly efficient continuous time version of the algorithm presented here (by means of Euler discretization).
Many typical environments produce input sequences that have both local and more global temporal structure. For instance, input sequences are often hierarchically organized (e.g. speech). In such cases, sequence-composing algorithms [Schmidhuber, 1991] [Schmidhuber, 1992] can provide superior alternatives to pure gradient-based algorithms.