Many researchers in neuro-control and reinforcement learning believe that some `compositional' method for learning to reach new goals by combining familiar action sequences into more complex new action sequences is necessary to overcome scaling problems associated with non-compositional algorithms.
The few previous ideas for attacking `compositional neural sequence learning' are inspired by dynamic programming and involve reinforcement learning networks arranged in a hierarchical fashion (e.g. [Watkins, 1989], [Jameson, 1991], [Singh, 1992], see also [Ring, 1991] for alternative ideas).
Our approach is entirely different from previous approaches. It is based on some initial ideas presented in [Schmidhuber, 1991a]. We describe gradient-based procedures for transforming knowledge about previously learned action sequences into appropriate subgoals for new problems. No external teacher is required. Our approach is limited, however, in the sense that it relies on differentiable (possibly adaptive) models of the costs associated with known action sequences.