Next: EXPERIMENTS Up: CURIOUS MODEL-BUILDING CONTROL SYSTEMS Previous: A CURIOUS SYSTEM BASED

PREDICTING ERROR CHANGES DIRECTLY

The reinforcement generating mechanism for the reinforcement learning systems described above can be modified in various ways. For instance, define $\bar{o_M(t)}$ as

's response to

after having adjusted

at time

. We can replace the confidence network by a network

which at every time step receives the current input

and whose target output is the current change of

's output $\triangle o_M(t) = o_M(t) - \bar{o_M(t)}$ caused by

's learning algorithm (

should have a small learning rate).

will learn approximations of the expectations

$\begin{displaymath}E \left\{ \triangle o_M(t) \mid i_M(t) \right\} \end{displaymath}$

of the changes of

's responses to given inputs. The absolute value $\mid o_H(t) \mid$ of

's output

(an approximation of $\mid E \left\{ \triangle o_M(t) \mid i_M(t) \right\} \mid$ ) should be taken as the reinforcement for the adaptive critic or the Q-learning algorithm (the reinforcement learning algorithm does not have to be specified here): The control system's curiosity goal at time

is to maximize

$\begin{displaymath}E\{ \sum_{t \geq t_0} - \gamma^{t - t_0} \mid o_H(t) \mid \}, \end{displaymath}$

where $0 \leq \gamma < 1$ is a discount rate. An alternative would be to make predictions about the (discounted) sum of future changes of

's weight vector and use these predictions in an analoguous manner.

Next: EXPERIMENTS Up: CURIOUS MODEL-BUILDING CONTROL SYSTEMS Previous: A CURIOUS SYSTEM BASED

Juergen Schmidhuber 2003-02-28

Back to Active Learning - Exploration - Curiosity page
Back to Reinforcement Learning page