In Proc. International Joint Conference on Neural
Networks, Singapore, volume 2, pages 1458-1463. IEEE, 1991.
A controller is a device which receives inputs from a (dynamic)
environment and produces outputs that manipulate the environmental
state. A model-building control system is a controller with
an additional module (the `world model')
which is trained to predict future inputs from previous input/action
pairs.
The novel
curious model-building control system described in this
paper is a model-building
control system which actively tries to provoke situations for which
it
learned to expect to learn something about the environment.
Such a system has been implemented as a
4-network system based on Watkins' Q-learning algorithm which
can be used to maximize
the
expectation of the temporal derivative of the adaptive assumed
reliability of future predictions.
An experiment with
an artificial non-deterministic environment demonstrates that the
system can be superior to
previous model-building control systems (the latter
do not address the problem of modelling the
reliability of the world model's predictions in uncertain environments
and use ad-hoc methods
(like random search) to train the world model).