next up previous
Next: BASIC MODULES Up: PLANNING SIMPLE TRAJECTORIES USING Previous: INTRODUCTION

A TYPICAL TASK

The following task is representative of a variety of analoguous tasks solvable by our method.

Consider figure 1. An `animat' moves in the real plane defined by the $x$ and $y$ axis, producing a trajectory $\omega$ in $R^2$. There are obstacles in the form of circular swamps. As long as the `animat' does not cross a swamp, there are no costs1 ( $=$ negative reinforcement). The $i$-th swamp $\Phi_i$ with center $(x_i,y_i)$ builds the basis of a cone (growing in the third dimension) with tip $(x_i,y_i, -h_i)$. Crossing $\Phi_i$ costs

\begin{displaymath}
\int_{\omega} g(x,y, \Phi_i)~dx~dy,
\end{displaymath} (1)

where $g(x,y, \Phi_i) = 0$ if $(x,y)$ lies outside of $\Phi_i$, else $g(x,y, \Phi_i)$ is the distance between $(x,y)$ and $\Phi_i$'s cone (measured along the the line through $(x,y)$ perpendicular to the real plane). In figure 1, the grey area indicates the costs associated with a trajectory leading straight through a single swamp.

Figure 1: An `animat' moving in the $x, y$ plane walks through a swamp. The costs are indicated by the grey area. Check out Schmidhuber's Habilitation thesis for pictures.

A problem $p$ is defined by a start state $s^p \in R^m$ and a goal state $g^p \in R^m$. In the above example, $m=2$ - start states and goal states are simply given by pairs of cartesian coordinates. We are looking for an action sequence leading from $s^p $ to $g^p$ with minimal costs.

It is true that in theory such sequences could be learned by conventional reinforcement learning algorithms (e.g. [Barto, 1989], [Barto et al., 1983], [Anderson, 1986], [Schmidhuber, 1991b], [Sutton, 1984], [Lin, 1991], [Williams, 1988], [Watkins, 1989]). For the sake of argument, assume that the maximal step size of the `animat' is just a tiny fraction of the obstacle diameter. Then all the above algorithms will take nearly forever to find appropriate cost-free trajectories for other than trivial start/goal combinations. One drawback of conventional algorithms is that they will try to learn each new task from scratch, instead of exploiting a possibility for speeding up learning and gaining efficiency by solving new tasks through composition of solutions for older tasks.


next up previous
Next: BASIC MODULES Up: PLANNING SIMPLE TRAJECTORIES USING Previous: INTRODUCTION
Juergen Schmidhuber 2003-03-14

Back to Subgoal learning - Hierarchical Learning
Pages with Subgoal learning pictures