After having provided a number of training examples for , usually will still make some errors, particularily if the training environment is noisy. How can we model the reliability of 's predictions?

We introduce an additional `confidence module' (not necessarily a neural network) whose input at time is the real vector and whose output at time is the real vector , where the real vector is the internal state of . At time there is a target output for the confidence module. should provide information about how reliable 's prediction can be expected to be [8] [5] [7].

In what follows, is the th component of a vector , denotes the expectation operator, denotes the dimensionality of vector , denotes the absolute value of scalar , denotes the conditional probability of given , and denotes the conditional expectation of given . For simplicity, we will concentrate on the case of for all . This means that 's and 's current outputs are based only on the current input. There is a variety of simple ways of representing reliability in :

*1. Modelling probabilities of global prediction failures.*
Let be one-dimensional.
Let
.
can be estimated
by
, where is the number of those times
with
and
where is the number of those times
with
.

*2. Modelling probabilities of local prediction failures.*
Let be -dimensional.
Let
for all
appropriate .
can be estimated
by
, where is the number of those times
with
and
where is the number of those times
with
.

Variations of method 1 and method 2 would not
measure the probabilities of exact matches between predictions
and reality but the probability of `near-matches' within a certain (e.g.
euclidian) tolerance.

*3. Modelling global expected error.*
Let be one-dimensional. Let

If is a back-propagation net (e.g. [14]), an approximation of can be obtained by using gradient descent (with a small learning rate) for training at time to emit 's error . This is a special case of the method described in [8] (there a fully recurrent net was employed). Of course, other error functions are possible. For instance, with the experiments described below the confidence network predicted the the absolute value of the difference between 's (one-dimensional) output and the current target value.

*4. Modelling local expected error.*
Let be -dimensional.
Let

for all appropriate . If is a back-propagation net, an approximation of can be obtained by using gradient descent (with a small learning rate) for training at time to emit 's local prediction errors

where .

Back to Active Learning - Exploration - Curiosity page

Back to Reinforcement Learning page