The idea is to take a recurrent sub-goal generator which at a given time step produces only one sub-goal. At the next time step this sub-goal is fed back to the start input of the same sub-goal generator (while the goal input remains constant). To adjust the weights of the sub-goal generator, we can use an algorithm inspired by the `back-propagation through time'-method: Successive sub-goals have to be fed into copies of as shown in figure 3 (figure 3 shows the special case of three sub-goals). Gradient descent requires to change according to the sum of all gradients computed for the various copies of . ( Of course, 's weight vector has to remain constant during ' credit assignment phase.)
While unfolding the system in time, it is not necessary to build real copies of and . It suffices if during activation spreading each unit in and stores its time-varying activations on a stack, from which they are popped during back-propagation phase.