Next: BASIC MODULES Up: PLANNING SIMPLE TRAJECTORIES USING Previous: INTRODUCTION

The following task is representative of a variety of analoguous tasks solvable by our method.

Consider figure 1. An animat' moves in the real plane defined by the and axis, producing a trajectory in . There are obstacles in the form of circular swamps. As long as the animat' does not cross a swamp, there are no costs1 ( negative reinforcement). The -th swamp with center builds the basis of a cone (growing in the third dimension) with tip . Crossing costs

 (1)

where if lies outside of , else is the distance between and 's cone (measured along the the line through perpendicular to the real plane). In figure 1, the grey area indicates the costs associated with a trajectory leading straight through a single swamp.

A problem is defined by a start state and a goal state . In the above example, - start states and goal states are simply given by pairs of cartesian coordinates. We are looking for an action sequence leading from to with minimal costs.

It is true that in theory such sequences could be learned by conventional reinforcement learning algorithms (e.g. [Barto, 1989], [Barto et al., 1983], [Anderson, 1986], [Schmidhuber, 1991b], [Sutton, 1984], [Lin, 1991], [Williams, 1988], [Watkins, 1989]). For the sake of argument, assume that the maximal step size of the `animat' is just a tiny fraction of the obstacle diameter. Then all the above algorithms will take nearly forever to find appropriate cost-free trajectories for other than trivial start/goal combinations. One drawback of conventional algorithms is that they will try to learn each new task from scratch, instead of exploiting a possibility for speeding up learning and gaining efficiency by solving new tasks through composition of solutions for older tasks.

Next: BASIC MODULES Up: PLANNING SIMPLE TRAJECTORIES USING Previous: INTRODUCTION
Juergen Schmidhuber 2003-03-14

Back to Subgoal learning - Hierarchical Learning
Pages with Subgoal learning pictures