next up previous
Next: EXPERIMENTS Up: CURIOUS MODEL-BUILDING CONTROL SYSTEMS Previous: A CURIOUS SYSTEM BASED

PREDICTING ERROR CHANGES DIRECTLY

The reinforcement generating mechanism for the reinforcement learning systems described above can be modified in various ways. For instance, define $\bar{o_M(t)}$ as $M$'s response to $i_M(t)$ after having adjusted $M$ at time $t$. We can replace the confidence network by a network $H$ which at every time step receives the current input $i_M(t)$ and whose target output is the current change of $M$'s output $\triangle o_M(t) = o_M(t) - \bar{o_M(t)}$ caused by $M$'s learning algorithm ($H$ should have a small learning rate). $H$ will learn approximations of the expectations

\begin{displaymath}E \left\{ \triangle o_M(t) \mid i_M(t) \right\} \end{displaymath}

of the changes of $M$'s responses to given inputs. The absolute value $\mid o_H(t) \mid$ of $H$'s output $o_H(t)$ (an approximation of $ \mid E \left\{ \triangle o_M(t) \mid i_M(t) \right\} \mid $ ) should be taken as the reinforcement for the adaptive critic or the Q-learning algorithm (the reinforcement learning algorithm does not have to be specified here): The control system's curiosity goal at time $t_0$ is to maximize

\begin{displaymath}E\{ \sum_{t \geq t_0} - \gamma^{t - t_0} \mid o_H(t) \mid \}, \end{displaymath}

where $0 \leq \gamma < 1$ is a discount rate. An alternative would be to make predictions about the (discounted) sum of future changes of $M$'s weight vector and use these predictions in an analoguous manner.


next up previous
Next: EXPERIMENTS Up: CURIOUS MODEL-BUILDING CONTROL SYSTEMS Previous: A CURIOUS SYSTEM BASED
Juergen Schmidhuber 2003-02-28


Back to Active Learning - Exploration - Curiosity page
Back to Reinforcement Learning page