next up previous
Next: CONTINUOUS HISTORY COMPRESSION Up: LEARNING UNAMBIGUOUS REDUCED SEQUENCE Previous: COLLAPSING THE HIERARCHY INTO

EXPERIMENTS

One experiment with a multi-level chunking architecture involved a grammar which produced strings of many $a$'s and $b$'s such that there was local temporal structure within the training strings. The task was to differ between certain strings whose ends overlapped. The conventional algorithm completely failed to solve the task; it became confused by the great numbers of input sequences with similar endings. Not so the chunking system: It soon discovered certain hierarchical temporal structures in the input sequences and decomposed the problem such that it was able to solve it within a few 100000 training sequences.

The 2-net chunking system (the one with the potential for collapsing levels) also was tested against the conventional recurrent net algorithms. See details in [21]. With the conventional algorithms, with various learning rates, and with more than 1.000.000 training sequences it was not possible to obtain significant performance improvement with a prediction task involving as few as 20 time steps between relevant events.

But, the 2-net chunking system was able to solve the task rather quickly. An efficient approximation of the BPTT-method was applied to both the chunker and the automatizer: Only 3 iterations of error propagation `back into the past' were performed at each time step. Most of the test runs required less than 5000 training sequences. Still the final weight matrix of the automatizer often looked like the one one would hope to get from the conventional algorithm. There were hidden units which learned to bridge the 20-step time lags by means of strong self-connections. The chunking system needed less computation per time step than the conventional method. Still it required much less training sequences.


next up previous
Next: CONTINUOUS HISTORY COMPRESSION Up: LEARNING UNAMBIGUOUS REDUCED SEQUENCE Previous: COLLAPSING THE HIERARCHY INTO
Juergen Schmidhuber 2003-02-25


Back to Recurrent Neural Networks page