Next: TWO SUBGOAL CREATING ARCHITECTURES Up: PLANNING SIMPLE TRAJECTORIES USING Previous: A TYPICAL TASK

BASIC MODULES

Our approach is based on three modules.

The first module is a `program executer' , which may be a neural net (but does not have to be one). With a given problem , emits a sequence of actions in response to its input vector, the `problem name' $s^p \circ g^p$ . Here ` $\circ$ ' denotes the concatenation operator for vectors. We assume (1) that there are problems for which does not `know' solutions with minimal costs but (2) that there also are many problems for which does `know' appropriate action sequences (otherwise our method will not provide additional efficiency). may have learned this by a conventional learning algorithm - or possibly even by a recursive application of the principle outlined below.

The second module is the evaluator . 's input can be the concatenation $s \circ g$ of two states and . 's non-negative output $eval(s,g) \in R^+_0$ is interpreted as a prediction of the costs ( negative reinforcement) for an action sequence (known by ) leading from to . means minimal expected costs.

represents a model of 's current abilities. For the purposes of this paper, we need not specify the details of - it may be an adaptive network (like in [Schmidhuber, 1991a]) as well as any other mapping whose output is differentiable with respect to the input.

The third module is the module of interest: the adaptive subgoal generator S. is supposed to learn to emit a list of appropriate subgoals in response to a novel start/goal combination. Section 4 will present two architectures for - one for simultaneous generation of all subgoals, the other one for sequential generation of the subgoal list.

The -th sub-goal of the list ( $i = 1 \ldots n$ ) is denoted by the vector $s^p(i) \in R^m$ , its -th component by . We set . Ideally, after training the subgoal-list should fulfill the following condition:

$\begin{displaymath} eval(s^p(0),s^p(1))= eval(s^p(1),s^p(2))= \ldots \end{displaymath}$

$\begin{displaymath} \ldots =eval(s^p(n),s^p(n+1)) = 0. \end{displaymath}$

(2)

Not all environments, however, allow to achieve (2). See section 5.

Next: TWO SUBGOAL CREATING ARCHITECTURES Up: PLANNING SIMPLE TRAJECTORIES USING Previous: A TYPICAL TASK

Juergen Schmidhuber 2003-03-14

Back to Subgoal learning - Hierarchical Learning
Pages with Subgoal learning pictures