Given are disks of different sizes, stacked in decreasing size on the first of three pegs. One may move some peg's top disk to the top of another peg, one disk at a time, but never a larger disk onto a smaller. The goal is to transfer all disks to the third peg. Remarkably, the fastest way of solving this famous problem requires moves .
The problem is of the reward-only-at-goal type -- given some instance of size , there is no intermediate reward for achieving instance-specific subgoals.
The exponential growth of minimal solution size is what makes the problem interesting: Brute force methods searching in raw solution space will quickly fail as increases. But the rapidly growing solutions do have something in common, namely, the short algorithm that generates them. Smart searchers will exploit such algorithmic regularities. Once we are searching in general algorithm space, however, it is essential to efficiently allocate time to algorithm tests. This is what OOPS does, in near-bias-optimal incremental fashion.
Untrained humans find it hard to solve instances . [1] applied traditional reinforcement learning methods and was able to solve instances up to , solvable within at most 7 moves. [28] used learning production systems and was able to solve instances up to , solvable within at most 31 moves. (Side note: [3] also applied an alternative reinforcement learner based on the artificial economy by [18] to a simpler 3 peg blocks world problem where any disk may be placed on any other; thus the required number of moves grows only linearly with the number of disks, not exponentially; [27] were able to replicate their results for up to 5.) Traditional AI planning procedures--compare chapter V by [43] and [25]--do not learn but systematically explore all possible move combinations, using only absolutely necessary task-specific primitives (while OOPS will later use more than 70 general instructions, most of them unnecessary). On current personal computers AI planners tend to fail to solve Hanoi problem instances with due to the exploding search space (Jana Koehler, IBM Research, personal communication, 2002). OOPS, however, searches program space instead of raw solution space. Therefore, in principle it should be able to solve arbitrary instances by discovering the problem's elegant recursive solution--given and three pegs (source peg, auxiliary peg, destination peg), define the following procedure:
HANOI(S,A,D,n): IF exit; ELSE DO:
call HANOI(S, D, A, n-1);
move top disk from S to D;
call HANOI(A, S, D, n-1).