Towers of Hanoi

Given are disks of different sizes, stacked in decreasing size on the first of three pegs. One may move some peg's top disk to the top of another peg, one disk at a time, but never a larger disk onto a smaller. The goal is to transfer all disks to the third peg. Remarkably, the fastest way of solving this famous problem requires moves .

The problem is of the *reward-only-at-goal* type --
given some instance of size , there is no intermediate reward for
achieving instance-specific subgoals.

The exponential growth of minimal solution size is what makes the problem interesting: Brute force methods searching in raw solution space will quickly fail as increases. But the rapidly growing solutions do have something in common, namely, the short algorithm that generates them. Smart searchers will exploit such algorithmic regularities. Once we are searching in general algorithm space, however, it is essential to efficiently allocate time to algorithm tests. This is what OOPS does, in near-bias-optimal incremental fashion.

Untrained humans find it hard to solve instances .
[1]
applied traditional reinforcement learning methods
and was able to solve instances up to , solvable within
at most 7 moves. [28] used
learning production systems and was able to solve instances up to ,
solvable within at most 31 moves.
(*Side note:*
[3] also applied an alternative reinforcement
learner based on the artificial economy by [18]
to a simpler 3 peg blocks world problem
where any disk may be placed on any other;
thus the required number of moves grows only
linearly with the number of disks, not exponentially;
[27] were able to
replicate their results for up to 5.)
Traditional AI planning procedures--compare
chapter V by [43] and [25]--do not
learn but systematically explore all possible move combinations,
using only absolutely necessary task-specific primitives
(while OOPS will later use more than 70 general instructions, most of
them unnecessary). On current personal computers
AI planners tend to fail to solve Hanoi problem instances with
due to the exploding search space (Jana Koehler, IBM Research,
personal communication, 2002).
OOPS, however, searches program space instead of
raw solution space. Therefore, in principle it should be able to solve
arbitrary instances by discovering
the problem's elegant recursive solution--given and
three pegs (source peg, auxiliary peg, destination peg),
define the following procedure:

HANOI(S,A,D,n): IF exit; ELSE DO:

call HANOI(S, D, A, n-1);

move top disk from S to D;

call HANOI(A, S, D, n-1).

Back to OOPS main page