next up previous
Next: TWO SUBGOAL CREATING ARCHITECTURES Up: PLANNING SIMPLE TRAJECTORIES USING Previous: A TYPICAL TASK

BASIC MODULES

Our approach is based on three modules.

The first module is a `program executer' $C$, which may be a neural net (but does not have to be one). With a given problem $p$, $C$ emits a sequence of actions in response to its input vector, the `problem name' $s^p \circ g^p$. Here `$\circ$' denotes the concatenation operator for vectors. We assume (1) that there are problems for which $C$ does not `know' solutions with minimal costs but (2) that there also are many problems for which $C$ does `know' appropriate action sequences (otherwise our method will not provide additional efficiency). $C$ may have learned this by a conventional learning algorithm - or possibly even by a recursive application of the principle outlined below.

The second module is the evaluator $E$. $E$'s input can be the concatenation $s \circ g$ of two states $s$ and $g$. $E$'s non-negative output $eval(s,g) \in R^+_0$ is interpreted as a prediction of the costs ( $=$ negative reinforcement) for an action sequence (known by $C$) leading from $s$ to $g$. $eval(s,g) =0$ means minimal expected costs.

$E$ represents a model of $C$'s current abilities. For the purposes of this paper, we need not specify the details of $E$ - it may be an adaptive network (like in [Schmidhuber, 1991a]) as well as any other mapping whose output is differentiable with respect to the input.

The third module is the module of interest: the adaptive subgoal generator S. $S$ is supposed to learn to emit a list of appropriate subgoals in response to a novel start/goal combination. Section 4 will present two architectures for $S$ - one for simultaneous generation of all subgoals, the other one for sequential generation of the subgoal list.

The $i$-th sub-goal of the list ( $i = 1 \ldots n$) is denoted by the vector $s^p(i) \in R^m$, its $j$-th component by $s^p_j(i)$. We set $s^p = s^p(0), g^p = s^p(n+1)$. Ideally, after training the subgoal-list $s^p(1), s^p(2), ..., s^p(n)$ should fulfill the following condition:

\begin{displaymath}
eval(s^p(0),s^p(1))=
eval(s^p(1),s^p(2))=
\ldots
\end{displaymath}


\begin{displaymath}
\ldots
=eval(s^p(n),s^p(n+1)) = 0.
\end{displaymath} (2)

Not all environments, however, allow to achieve (2). See section 5.


next up previous
Next: TWO SUBGOAL CREATING ARCHITECTURES Up: PLANNING SIMPLE TRAJECTORIES USING Previous: A TYPICAL TASK
Juergen Schmidhuber 2003-03-14

Back to Subgoal learning - Hierarchical Learning
Pages with Subgoal learning pictures