next up previous
Next: A CURIOUS SYSTEM BASED Up: ADAPTIVE CURIOSITY Previous: ADAPTIVE CURIOSITY

THE BASIC PRINCIPLE

This subsection discusses a rather general principle of adaptive curiosity. Here we do not have to care whether the adaptive world model is implemented as a back-propagation network, as a lookup table, or as something else. There are certain natural implementations of the ideas; they are discussed in the following subsections.

The basic principle can be formulated as follows: Learn a mapping from actions (or action sequences) to the expectation of future performance improvement of the world model. Encourage action sequences where this expectation is high.

One way to do this is the following (section 4 will describe alternatives): Model the reliability of the predictions of the adaptive predictor as described in section 2. At time $t$, spend reinforcement for the model-building control system in proportion to the current change of reliability of the adaptive predictor. The `curiosity goal' of the control system (it might have additional `pre-wired' goals) is to maximize the expectation of the cumulative sum of future positive or negative changes in prediction reliability.

More formally: The control system's curiosity goal at time $t_0$ is to maximize

\begin{displaymath}E\{ \sum_{t \geq t_0} - \gamma^{t - t_0} \triangle o_C(t+1) \}. \end{displaymath}

Here $0 \leq \gamma < 1$ is a discount factor for avoiding infinite sums, and $\triangle o_C(t)$ is the (positive or negative) change of assumed reliability caused by the observation of $i_M(t)$, $o_M(t)$, and $x(t+1)$.

For instance, if method 1 or method 3 from section 2 is employed, then $\triangle o_C(t) = o_C(t) - \bar{o_C(t)}$, where $\bar{o_C(t)}$ is $C$'s response to $i_M(t)$ after having adjusted $C$ at time $t$.

So far the discussion did not have to refer to a particular reinforcement learning algorithm. Every sensible reinforcement learning algorithm ought to be useful (e.g [1][16][13][9]). For instance, [6] describes how adaptive critics [1][15] can be used to build a `curious' model-building control system based on the principle described above. The following subsection focusses on Watkins' recent `Q-learning' method.


next up previous
Next: A CURIOUS SYSTEM BASED Up: ADAPTIVE CURIOSITY Previous: ADAPTIVE CURIOSITY
Juergen Schmidhuber 2003-02-28


Back to Active Learning - Exploration - Curiosity page
Back to Reinforcement Learning page