next up previous
Next: EXPERIMENTS Up: PLANNING SIMPLE TRAJECTORIES USING Previous: OBJECTIVE FUNCTION

ALGORITHMS

With both architectures we apply the chain rule to compute the gradient

\begin{displaymath}
\frac{\partial \sum_{k=1}^{n+1} \frac{1}{2} eval^2(s^p(k-1),s^p(k))}
{\partial W_S},
\end{displaymath}

where $W_S$ denotes the weight vector of $S$. During each training iteration, $W_S$ has to be changed in proportion to this gradient.

With architecture 1, this is essentially done by back-propagating error signals (e.g. [Werbos, 1974], [Parker, 1985], [LeCun, 1985], [Rumelhart et al., 1986]) through copies of the evaluator modules down into the subgoal generator. Loosely speaking, each subgoal `receives error signals from two adjacent copies of $E$'. These error signals are added and flow down into $S$, where they cause appropriate weight changes. One might say that in general two `neighboring' evaluator copies (see figure 2) tend to pull their common subgoal into different directions. The iterative process stops when a local or global minimum of (3) is found. This corresponds to an `equilibrium' of the partly conflicting forces originating from different evaluator copies.

The derivation of the more complex algorithm for the recurrent architecture 2 is analoguous to the derivation of conventional discrete time recurrent net algorithms (e.g. [Robinson and Fallside, 1987], [Williams, 1989], (Williams and Zipser, in press), [Schmidhuber, 1992]).



Juergen Schmidhuber 2003-03-14

Back to Subgoal learning - Hierarchical Learning
Pages with Subgoal learning pictures