**Previous approaches.**
In recent years numerous methods have been proposed
to address this problem. For instance,
Schmidhuber (1992) suggested to compress regular sequences
with the chunking algorithm.
Hochreiter's ideas (1991) led to the recent LSTM algorithm
(Hochreiter and Schmidhuber, 1997a).
Bengio et al. (1994) investigated simulated annealing,
multi-grid random search, time-weighted pseudo-Newton
optimization, and discrete error propagation,
to evaluate their performance in learning long-term dependencies.
Bengio and Frasconi (1994)
also proposed an EM approach for propagating targets.

**Previous benchmarks.**
Several researchers use the *2-sequence problem*
(and *latch problem*) to
compare various algorithms, e.g.,
Bengio et al. (1994),
Bengio and Frasconi (1994),
El Hihi and Bengio (1995),
Lin et al. (1995).
For the same purpose, some also use the
*parity problem*,
e.g.,
Bengio et al. (1994),
Bengio and Frasconi (1994).
Some of Tomita's grammars (1982)
are also often used as
benchmark problems
for recurrent nets (see, e.g.,
Bengio and Frasconi, 1995;
Watrous and Kuhn, 1992;
Pollack, 1991;
Miller and Giles, 1993;
Manolios and Fanelli, 1994).

**Overview.**
In this paper we will show that some instances of such problems
can be solved much more quickly by random weight guessing (RG).
In other cases, however, RG does not perform well.
This paper's purpose is *not* to suggest that RG is a
good algorithm for training recurrent networks.
Instead it intends to contribute to understanding RG's
significance and shortcomings in long time lag algorithms.
It also suggests that RG should be used
as a first test to evaluate some benchmark problem's difficulty.