The first module is a `program executer' , which may be a neural net (but does not have to be one). With a given problem , emits a sequence of actions in response to its input vector, the `problem name' . Here `' denotes the concatenation operator for vectors. We assume (1) that there are problems for which does not `know' solutions with minimal costs but (2) that there also are many problems for which does `know' appropriate action sequences (otherwise our method will not provide additional efficiency). may have learned this by a conventional learning algorithm - or possibly even by a recursive application of the principle outlined below.
The second module is the evaluator . 's input can be the concatenation of two states and . 's non-negative output is interpreted as a prediction of the costs ( negative reinforcement) for an action sequence (known by ) leading from to . means minimal expected costs.
represents a model of 's current abilities. For the purposes of this paper, we need not specify the details of - it may be an adaptive network (like in [Schmidhuber, 1991a]) as well as any other mapping whose output is differentiable with respect to the input.
The third module is the module of interest: the adaptive subgoal generator S. is supposed to learn to emit a list of appropriate subgoals in response to a novel start/goal combination. Section 4 will present two architectures for - one for simultaneous generation of all subgoals, the other one for sequential generation of the subgoal list.
The -th sub-goal of the list
(
)
is denoted by the vector
, its
-th
component by .
We set
.
Ideally, after training the subgoal-list
should fulfill the following condition:
(2) |