Next: BASIC MODULES Up: PLANNING SIMPLE TRAJECTORIES USING Previous: INTRODUCTION

A TYPICAL TASK

The following task is representative of a variety of analoguous tasks solvable by our method.

Consider figure 1. An `animat' moves in the real plane defined by the and axis, producing a trajectory $\omega$ in . There are obstacles in the form of circular swamps. As long as the `animat' does not cross a swamp, there are no costs¹ ( negative reinforcement). The -th swamp $\Phi_i$ with center builds the basis of a cone (growing in the third dimension) with tip . Crossing $\Phi_i$ costs

$\begin{displaymath} \int_{\omega} g(x,y, \Phi_i)~dx~dy, \end{displaymath}$

(1)

where $g(x,y, \Phi_i) = 0$ if

lies outside of $\Phi_i$ , else $g(x,y, \Phi_i)$ is the distance between

and $\Phi_i$ 's cone (measured along the the line through

perpendicular to the real plane). In figure 1, the grey area indicates the costs associated with a trajectory leading straight through a single swamp.

**Figure 1:** An `animat' moving in the plane walks through a swamp. The costs are indicated by the grey area. Check out Schmidhuber's Habilitation thesis for pictures.

A problem is defined by a start state $s^p \in R^m$ and a goal state $g^p \in R^m$ . In the above example, - start states and goal states are simply given by pairs of cartesian coordinates. We are looking for an action sequence leading from to with minimal costs.

It is true that in theory such sequences could be learned by conventional reinforcement learning algorithms (e.g. [Barto, 1989], [Barto et al., 1983], [Anderson, 1986], [Schmidhuber, 1991b], [Sutton, 1984], [Lin, 1991], [Williams, 1988], [Watkins, 1989]). For the sake of argument, assume that the maximal step size of the `animat' is just a tiny fraction of the obstacle diameter. Then all the above algorithms will take nearly forever to find appropriate cost-free trajectories for other than trivial start/goal combinations. One drawback of conventional algorithms is that they will try to learn each new task from scratch, instead of exploiting a possibility for speeding up learning and gaining efficiency by solving new tasks through composition of solutions for older tasks.

Next: BASIC MODULES Up: PLANNING SIMPLE TRAJECTORIES USING Previous: INTRODUCTION

Juergen Schmidhuber 2003-03-14

Back to Subgoal learning - Hierarchical Learning
Pages with Subgoal learning pictures