LEARNING UNAMBIGUOUS REDUCED SEQUENCE DESCRIPTIONS

Jürgen Schmidhuber
University of Colorado
Boulder, CO 80309, USA
yirgan@cs.colorado.edu

In J. E. Moody, S. J. Hanson, and R. P. Lippman, editors, Advances in Neural Information Processing Systems 4, NIPS'4, pages 291-298. San Mateo, CA: Morgan Kaufmann, 1992.

Abstract:

You want your neural net algorithm to learn sequences? Do not just use conventional gradient descent (or approximations thereof) in recurrent nets, time-delay nets etc. Instead, use your sequence learning algorithm to implement the following method: No matter what your final goals are, train a network to predict its next input from the previous ones. Since only unpredictable inputs convey new information, ignore all predictable inputs but let all unexpected inputs (plus information about the time step at which they occurred) become inputs to a higher-level network of the same kind (working on a slower, self-adjusting time scale). Go on building a hierarchy of such networks. This principle reduces the descriptions of event sequences without loss of information, thus easing supervised or reinforcement learning tasks. Experiments show that systems based on this principle can require less computation per time step

many fewer training sequences than conventional training algorithms for recurrent nets. I also discuss a method involving only two recurrent networks which tries to collapse a multi-level predictor hierarchy into a single recurrent net.