In J. A. Meyer and S. W. Wilson, editors, Proc. of the International Conference on Simulation of Adaptive Behavior: From Animals to Animats, pages 222-227. MIT Press/Bradford Books, 1991.
This paper introduces a framework for `curious neural controllers' which employ an adaptive world model for goal directed on-line learning.
First an on-line reinforcement learning algorithm for autonomous `animats' is described. The algorithm is based on two fully recurrent `self-supervised' continually running networks which learn in parallel. One of the networks learns to represent a complete model of the environmental dynamics and is called the `model network'. It provides complete `credit assignment paths' into the past for the second network which controls the animats physical actions in a possibly reactive environment. The animats goal is to maximize cumulative reinforcement and minimize cumulative `pain'.
The algorithm has properties which allow to implement something like the desire to improve the model network's knowledge about the world. This is related to curiosity. It is described how the particular algorithm (as well as similar model-building algorithms) may be augmented by dynamic curiosity and boredom in a natural manner. This may be done by introducing (delayed) reinforcement for actions that increase the model network's knowledge about the world. This in turn requires the model network to model its own ignorance, thus showing a rudimentary form of self-introspective behavior.