Next: EXPERIMENTS Up: PLANNING SIMPLE TRAJECTORIES USING Previous: OBJECTIVE FUNCTION

ALGORITHMS

With both architectures we apply the chain rule to compute the gradient

$\begin{displaymath} \frac{\partial \sum_{k=1}^{n+1} \frac{1}{2} eval^2(s^p(k-1),s^p(k))} {\partial W_S}, \end{displaymath}$

where

denotes the weight vector of

. During each training iteration,

has to be changed in proportion to this gradient.

With architecture 1, this is essentially done by back-propagating error signals (e.g. [Werbos, 1974], [Parker, 1985], [LeCun, 1985], [Rumelhart et al., 1986]) through copies of the evaluator modules down into the subgoal generator. Loosely speaking, each subgoal `receives error signals from two adjacent copies of '. These error signals are added and flow down into , where they cause appropriate weight changes. One might say that in general two `neighboring' evaluator copies (see figure 2) tend to pull their common subgoal into different directions. The iterative process stops when a local or global minimum of (3) is found. This corresponds to an `equilibrium' of the partly conflicting forces originating from different evaluator copies.

The derivation of the more complex algorithm for the recurrent architecture 2 is analoguous to the derivation of conventional discrete time recurrent net algorithms (e.g. [Robinson and Fallside, 1987], [Williams, 1989], (Williams and Zipser, in press), [Schmidhuber, 1992]).

Juergen Schmidhuber 2003-03-14

Back to Subgoal learning - Hierarchical Learning
Pages with Subgoal learning pictures