next up previous
Next: TOMITA GRAMMARS Up: EVALUATING LONG-TERM DEPENDENCY BENCHMARK Previous: LATCH AND 2-SEQUENCE PROBLEMS

PARITY PROBLEM

The task requires to classify sequences consisting of 1's and -1's according to whether the number of 1's is even or odd (Bengio et al. 1994; Bengio and Frasconi 1994). The output neuron has a sigmoid activation function; the target at sequence end is 1.0 for odd and 0.0 for even. Bengio et al. (1994) also add [-0.2, 0.2] uniform noise to the sequence elements.

Previous Results. For sequences with lengths between only 25 and 50 steps, among the six methods tested by Bengio et al. (1994), only simulated annealing was reported to achieve zero final classification (within about 810000 trials). A method called discrete error BP took about 54000 trials to achieve final classification error of 0.05. In Bengio and Frasconi's more recent work (1994), for sequences with 250-500 steps, an Input/Output Hidden Markov Model (IOHMM) took about 3400 trials to achieve final classification error of 0.12. Comparable results were obtained with and without noise.

Results with RG. Here two cases are considered. In the first case, the inputs are binary +1's and -1's. In the second case, uniform noise in [-0.2, 0.2] is added to the input. With no noise, RG with A1 ($n=1$, identical to Bengio et al.'s 1994 architecture) solves the problem within 2906 trials on average. This is comparable to or better than the best already reported results. RG with A2 (no noise) solves it within 2797 trials. With architecture A2, but without self-connections for the hidden units, RG solves no-noise parity within 250 trials on average. This is much better than any previously reported results. Such excellent results can be obtained even with sequence lengths exceeding 500.

In case of noisy inputs, however, RG (with architecture A1) requires 41400 trials on average. If we ignore that RG trials are much shorter than IOHMM trials then IOHMM seems faster. This suggests that adding input noise may be a way to break down candidate learning algorithms relying a lot on random search.


next up previous
Next: TOMITA GRAMMARS Up: EVALUATING LONG-TERM DEPENDENCY BENCHMARK Previous: LATCH AND 2-SEQUENCE PROBLEMS
Juergen Schmidhuber 2003-02-19