next up previous
Next: PREDICTING ERROR CHANGES DIRECTLY Up: ADAPTIVE CURIOSITY Previous: THE BASIC PRINCIPLE

A CURIOUS SYSTEM BASED ON Q-LEARNING

Here we describe how a reinforcement learning method called Q-learning can be used to build a `curious' model builder. The notation is the same as above. Following [13] we introduce an adaptive function $Q$ for evaluating pairs of inputs $x(t)$ and actions $a(t)$ as well as an utility function $U$ for evaluating inputs $x(t)$.

After random initialization of $C$, $M$, $A$, $U$, and $Q$, at each time step $t$ the following algorithm is performed:



$\textstyle \parbox{14cm}{
\par
1. Randomly select $p \in [0, \ldots, 1]$.
If $...
...Q(x(t), a) = max_b Q(x(t), b)$.
\par
5. $U(x(t)) \leftarrow Q(x(t), a)$.
\par
}$

Note that the algorithm does not specify the implementation of $C$, $M$, and $A$. All three can be implemented as lookup tables or (in hope for useful `generalizations') as back-propagation networks, Boltzmann-machines, etc. $Q$ and $U$ may be replaced by back-propagation networks, too (see the experiments described in section 5).



Juergen Schmidhuber 2003-02-28


Back to Active Learning - Exploration - Curiosity page
Back to Reinforcement Learning page