There are at least two important problems with all of these approaches that have not been addressed so far:

*1. Previous model-building control systems
are not well-suited for
uncertain non-deterministic environments.* In particular,
they do not
model the reliability of the predictions of the adaptive
world models. Therefore, if credit assignment for the controller
is based on the assumption of a correct world model, unexpected
results may be obtained.

*2. Previous model-building control systems
employ some ad-hoc method for establishing the world model. *
For instance,
[2],
[3],
[10],
and others use random
search to train the world model.
[11] uses a local input/output representation
and makes the probability of making a
certain training experiment dependent on the time that went by since
the system made the last experiment of the same type.
These methods work fine for certain problems, but they
do not address the challenges of real world tasks in uncertain
environments. There are at least two (related) sources
of efficiency which are neglected by these approaches:

*2A. Not much additional training time should be wasted
on exploring those parts of the world which are already well-modelled.
2B. Not much additional training time should be wasted
on exploring those parts of the world where the expectation of future
improvement of the world model is low.*

The first contribution of this paper (section 2) is to show how one can adaptively model the reliability of a predictor's predictions.

The second (and most important) contribution of this paper (section 3) is to show how reinforcement learning can be used for teaching a model-building control system to actively generate training examples for increasing the reliability of the predictions of its world model. This is relevant for the problem of `on-line state space exploration'. The approach is based on learning to estimate the effects of further learning.

Back to Active Learning - Exploration - Curiosity page

Back to Reinforcement Learning page