** Next:** FINAL REMARKS
** Up:** EVALUATING LONG-TERM DEPENDENCY BENCHMARK
** Previous:** PARITY PROBLEM

Many authors also use Tomita's grammars (1982)
to test their
algorithms. See, e.g.,
Bengio and Frasconi (1995),
Watrous and Kuhn (1992),
Pollack (1991),
Miller and Giles (1993),
Manolios and Fanelli (1994).
Since we already tested parity problems above,
we focus here on a few ``parity-free'' Tomita grammars
(the grammars #1, #2, and #4).
Most previous work facilitated the learning problem by
restricting sequence length. E.g.,
Miller and Giles'
maximal test sequence length is 15, and
maximal training sequence length is 10.
Miller and Giles (1993)
report the number of sequences required
for convergence (for various first and second order nets with
3 to 9 units):
Tomita #1: 23000 - 46000;
Tomita #2: 77000 - 200000;
Tomita #4: 46000 - 210000.
RG, however, performs better in these cases (as always, we use
the experimental conditions described in section 2).
The average results are:
Tomita #1: 182 (with A1, ) and 288 (with A2),
Tomita #2: 1511 (with A1, ) and 17953 (with A2),
Tomita #4: 13833 (with A1, ) and 35610 (with A2).
It should be mentioned, however, that
by using our architectures and very short
training sequences
(in the style of Miller & Giles)
one can achieve reasonable results
with gradient descent, too.

** Next:** FINAL REMARKS
** Up:** EVALUATING LONG-TERM DEPENDENCY BENCHMARK
** Previous:** PARITY PROBLEM
Juergen Schmidhuber
2003-02-19