Our previous experimental comparisons (on widely used benchmark problems) with RTRL (e.g., [15]; results compared to the ones in [17]), Recurrent Cascade-Correlation [5], Elman nets (results compared to the ones in [4]), and Neural Sequence Chunking [16], demonstrated that LSTM leads to many more successful runs than its competitors, and learns much faster [8]. The following tasks, though, are more difficult than the above benchmark problems: they cannot be solved at all in reasonable time by RS (we tried various architectures) nor any other recurrent net learning algorithm we are aware of (see [13] for an overview). In the experiments below, gate units ( ) and output units are sigmoid in . is sigmoid in , and is sigmoid in .