Next: ADAPTIVE CURIOSITY Up: CURIOUS MODEL-BUILDING CONTROL SYSTEMS Previous: INTRODUCTION

Consider an adaptive discrete time predictor' (not necessarily a neural network) whose input at time is the real vector and whose output at time is the real vector , where the real vector represents the internal state of . Meaningful internal states are required if the prediction task requires to memorize past events. At time there is a target output . The predictor's goal is to make for all .

After having provided a number of training examples for , usually will still make some errors, particularily if the training environment is noisy. How can we model the reliability of 's predictions?

We introduce an additional confidence module' (not necessarily a neural network) whose input at time is the real vector and whose output at time is the real vector , where the real vector is the internal state of . At time there is a target output for the confidence module. should provide information about how reliable 's prediction can be expected to be [8] [5] [7].

In what follows, is the th component of a vector , denotes the expectation operator, denotes the dimensionality of vector , denotes the absolute value of scalar , denotes the conditional probability of given , and denotes the conditional expectation of given . For simplicity, we will concentrate on the case of for all . This means that 's and 's current outputs are based only on the current input. There is a variety of simple ways of representing reliability in :

1. Modelling probabilities of global prediction failures. Let be one-dimensional. Let . can be estimated by , where is the number of those times with and where is the number of those times with .

2. Modelling probabilities of local prediction failures. Let be -dimensional. Let for all appropriate . can be estimated by , where is the number of those times with and where is the number of those times with .

Variations of method 1 and method 2 would not measure the probabilities of exact matches between predictions and reality but the probability of `near-matches' within a certain (e.g. euclidian) tolerance.

3. Modelling global expected error. Let be one-dimensional. Let

If is a back-propagation net (e.g. [14]), an approximation of can be obtained by using gradient descent (with a small learning rate) for training at time to emit 's error . This is a special case of the method described in [8] (there a fully recurrent net was employed). Of course, other error functions are possible. For instance, with the experiments described below the confidence network predicted the the absolute value of the difference between 's (one-dimensional) output and the current target value.

4. Modelling local expected error. Let be -dimensional. Let

for all appropriate . If is a back-propagation net, an approximation of can be obtained by using gradient descent (with a small learning rate) for training at time to emit 's local prediction errors

where .

Next: ADAPTIVE CURIOSITY Up: CURIOUS MODEL-BUILDING CONTROL SYSTEMS Previous: INTRODUCTION
Juergen Schmidhuber 2003-02-28

Back to Active Learning - Exploration - Curiosity page
Back to Reinforcement Learning page