Next: ADAPTIVE CONFIDENCE Up: CURIOUS MODEL-BUILDING CONTROL SYSTEMS Previous: CURIOUS MODEL-BUILDING CONTROL SYSTEMS

INTRODUCTION

Much of the recent research on adaptive neuro-control and reinforcement learning focusses on systems with sub-modules that learn to predict inputs from the environment. These sub-modules often are called `adaptive world models'; they are useful for a whole variety of control tasks. For instance, Werbos' and Jordan's architectures for neuro-control [15][2] contain an adaptive world model in form of a back-propagation module (the model network) which is trained to predict the next input, given the current input and the current output of an adaptive control network. The model network allows to compute error gradients for the controller outputs. This is essential, since with typical adaptive neuro-control tasks there is no teacher who provides desired controller outputs. There is only a desired environmental input. Extensions of this approach [9] rely on the same basic principles. `DYNA-systems' [11] use adaptive world models for limiting the number of `real-world experiences' necessary to solve certain reinforcement learning tasks.

There are at least two important problems with all of these approaches that have not been addressed so far:

1. Previous model-building control systems are not well-suited for uncertain non-deterministic environments. In particular, they do not model the reliability of the predictions of the adaptive world models. Therefore, if credit assignment for the controller is based on the assumption of a correct world model, unexpected results may be obtained.

2. Previous model-building control systems employ some ad-hoc method for establishing the world model. For instance, [2], [3], [10], and others use random search to train the world model. [11] uses a local input/output representation and makes the probability of making a certain training experiment dependent on the time that went by since the system made the last experiment of the same type. These methods work fine for certain problems, but they do not address the challenges of real world tasks in uncertain environments. There are at least two (related) sources of efficiency which are neglected by these approaches:

2A. Not much additional training time should be wasted on exploring those parts of the world which are already well-modelled. 2B. Not much additional training time should be wasted on exploring those parts of the world where the expectation of future improvement of the world model is low.

The first contribution of this paper (section 2) is to show how one can adaptively model the reliability of a predictor's predictions.

The second (and most important) contribution of this paper (section 3) is to show how reinforcement learning can be used for teaching a model-building control system to actively generate training examples for increasing the reliability of the predictions of its world model. This is relevant for the problem of `on-line state space exploration'. The approach is based on learning to estimate the effects of further learning.

Next: ADAPTIVE CONFIDENCE Up: CURIOUS MODEL-BUILDING CONTROL SYSTEMS Previous: CURIOUS MODEL-BUILDING CONTROL SYSTEMS

Juergen Schmidhuber 2003-02-28

Back to Active Learning - Exploration - Curiosity page
Back to Reinforcement Learning page