**Jürgen
Schmidhuber**^{1}

TUM

**In J. A. Meyer and S. W. Wilson, editors,
Proc. of the International Conference on Simulation of
Adaptive Behavior: From Animals to Animats, pages 222-227.
MIT Press/Bradford Books, 1991.**

This paper introduces a framework for `curious neural controllers' which employ an adaptive world model for goal directed on-line learning.

First an on-line reinforcement learning algorithm for autonomous `animats' is described. The algorithm is based on two fully recurrent `self-supervised' continually running networks which learn in parallel. One of the networks learns to represent a complete model of the environmental dynamics and is called the `model network'. It provides complete `credit assignment paths' into the past for the second network which controls the animats physical actions in a possibly reactive environment. The animats goal is to maximize cumulative reinforcement and minimize cumulative `pain'.

The algorithm has properties which allow to implement
something like *the desire to improve the model network's
knowledge about the world*. This is related to
*curiosity.*
It is described how the particular algorithm (as well as
similar model-building algorithms) may be augmented by dynamic
*curiosity* and *boredom* in a natural manner.
This may be done by introducing
(delayed) reinforcement for actions that increase the
model network's knowledge about the world. This in turn requires
the model network *to model its own ignorance*, thus
showing a rudimentary form of *self-introspective* behavior.

- 1. Introduction
- 2. Implementing Dynamic Curiosity and Boredom
- Concluding Remarks
- Bibliography
- About this document ...

Back to Active Learning - Exploration - Curiosity page