next up previous
Next: Task Representation and Domain-Specific Up: Experiments: 60 Successive Tasks Previous: On Task-Specific Initialization


Towers of Hanoi

Given are $n$ disks of $n$ different sizes, stacked in decreasing size on the first of three pegs. One may move some peg's top disk to the top of another peg, one disk at a time, but never a larger disk onto a smaller. The goal is to transfer all disks to the third peg. Remarkably, the fastest way of solving this famous problem requires $2^n - 1$ moves $(n \geq 0)$.

The problem is of the reward-only-at-goal type -- given some instance of size $n$, there is no intermediate reward for achieving instance-specific subgoals.

The exponential growth of minimal solution size is what makes the problem interesting: Brute force methods searching in raw solution space will quickly fail as $n$ increases. But the rapidly growing solutions do have something in common, namely, the short algorithm that generates them. Smart searchers will exploit such algorithmic regularities. Once we are searching in general algorithm space, however, it is essential to efficiently allocate time to algorithm tests. This is what OOPS does, in near-bias-optimal incremental fashion.

Untrained humans find it hard to solve instances $n>6$. [1] applied traditional reinforcement learning methods and was able to solve instances up to $n=3$, solvable within at most 7 moves. [28] used learning production systems and was able to solve instances up to $n=5$, solvable within at most 31 moves. (Side note: [3] also applied an alternative reinforcement learner based on the artificial economy by [18] to a simpler 3 peg blocks world problem where any disk may be placed on any other; thus the required number of moves grows only linearly with the number of disks, not exponentially; [27] were able to replicate their results for $n$ up to 5.) Traditional AI planning procedures--compare chapter V by [43] and [25]--do not learn but systematically explore all possible move combinations, using only absolutely necessary task-specific primitives (while OOPS will later use more than 70 general instructions, most of them unnecessary). On current personal computers AI planners tend to fail to solve Hanoi problem instances with $n > 15$ due to the exploding search space (Jana Koehler, IBM Research, personal communication, 2002). OOPS, however, searches program space instead of raw solution space. Therefore, in principle it should be able to solve arbitrary instances by discovering the problem's elegant recursive solution--given $n$ and three pegs $S,A,D$ (source peg, auxiliary peg, destination peg), define the following procedure:

HANOI(S,A,D,n): IF $n=0$ exit; ELSE DO:

call HANOI(S, D, A, n-1);

move top disk from S to D;

call HANOI(A, S, D, n-1).



Subsections
next up previous
Next: Task Representation and Domain-Specific Up: Experiments: 60 Successive Tasks Previous: On Task-Specific Initialization
Juergen Schmidhuber 2004-04-15

Back to OOPS main page