A CURIOUS SYSTEM BASED ON Q-LEARNING

Next: PREDICTING ERROR CHANGES DIRECTLY Up: ADAPTIVE CURIOSITY Previous: THE BASIC PRINCIPLE

A CURIOUS SYSTEM BASED ON Q-LEARNING

Here we describe how a reinforcement learning method called Q-learning can be used to build a `curious' model builder. The notation is the same as above. Following [13] we introduce an adaptive function for evaluating pairs of inputs and actions as well as an utility function for evaluating inputs .

After random initialization of , , , , and , at each time step the following algorithm is performed:

$\textstyle \parbox{14cm}{ \par 1. Randomly select $p \in [0, \ldots, 1]$. If $... ...Q(x(t), a) = max_b Q(x(t), b)$. \par 5. $U(x(t)) \leftarrow Q(x(t), a)$. \par }$

Note that the algorithm does not specify the implementation of , , and . All three can be implemented as lookup tables or (in hope for useful `generalizations') as back-propagation networks, Boltzmann-machines, etc. and may be replaced by back-propagation networks, too (see the experiments described in section 5).

Juergen Schmidhuber 2003-02-28

Back to Active Learning - Exploration - Curiosity page
Back to Reinforcement Learning page