There are at least two important problems with all of these approaches that have not been addressed so far:
1. Previous model-building control systems are not well-suited for uncertain non-deterministic environments. In particular, they do not model the reliability of the predictions of the adaptive world models. Therefore, if credit assignment for the controller is based on the assumption of a correct world model, unexpected results may be obtained.
2. Previous model-building control systems employ some ad-hoc method for establishing the world model. For instance, [2], [3], [10], and others use random search to train the world model. [11] uses a local input/output representation and makes the probability of making a certain training experiment dependent on the time that went by since the system made the last experiment of the same type. These methods work fine for certain problems, but they do not address the challenges of real world tasks in uncertain environments. There are at least two (related) sources of efficiency which are neglected by these approaches:
2A. Not much additional training time should be wasted on exploring those parts of the world which are already well-modelled. 2B. Not much additional training time should be wasted on exploring those parts of the world where the expectation of future improvement of the world model is low.
The first contribution of this paper (section 2) is to show how one can adaptively model the reliability of a predictor's predictions.
The second (and most important) contribution of this paper (section 3) is to show how reinforcement learning can be used for teaching a model-building control system to actively generate training examples for increasing the reliability of the predictions of its world model. This is relevant for the problem of `on-line state space exploration'. The approach is based on learning to estimate the effects of further learning.