Here we describe how a reinforcement learning method called Q-learning can be used to build a `curious' model builder. The notation is the same as above. Following [13] we introduce an adaptive function for evaluating pairs of inputs and actions as well as an utility function for evaluating inputs .
After random initialization of , , , , and , at each time step the following algorithm is performed:
Note that the algorithm does not specify the implementation of , , and . All three can be implemented as lookup tables or (in hope for useful `generalizations') as back-propagation networks, Boltzmann-machines, etc. and may be replaced by back-propagation networks, too (see the experiments described in section 5).