The 2-net chunking system (the one with the potential for collapsing levels) also was tested against the conventional recurrent net algorithms. See details in . With the conventional algorithms, with various learning rates, and with more than 1.000.000 training sequences it was not possible to obtain significant performance improvement with a prediction task involving as few as 20 time steps between relevant events.
But, the 2-net chunking system was able to solve the task rather quickly. An efficient approximation of the BPTT-method was applied to both the chunker and the automatizer: Only 3 iterations of error propagation `back into the past' were performed at each time step. Most of the test runs required less than 5000 training sequences. Still the final weight matrix of the automatizer often looked like the one one would hope to get from the conventional algorithm. There were hidden units which learned to bridge the 20-step time lags by means of strong self-connections. The chunking system needed less computation per time step than the conventional method. Still it required much less training sequences.