Despite the near-bias-optimality of OOPS, within reasonable time (a week) on a personal computer, the system with 71 initial instructions was not able to solve instances involving more than 3 disks. What does this mean? Since search time of an optimal searcher is a natural measure of initial bias, it just means that the already nonnegligible bias towards our task set was still too weak.
This actually gives us an opportunity to demonstrate that OOPS can indeed benefit from its incremental learning abilities. Unlike Levin's and Hutter's nonincremental methods OOPS always tries to profit from experience with previous tasks. Therefore, to properly illustrate its behavior, we need an example where it does profit. In what follows, we will first train it on additional, easier tasks, to teach it something about recursion, hoping that the resulting code bias shifts will help to solve the Hanoi tasks as well.
For this purpose we use a seemingly unrelated problem class based on the context free language : given input on the data stack ds, the goal is to place symbols on the auxiliary stack Ds such that the topmost elements are 1's followed by 2's. Again there is no intermediate reward for achieving instance-specific subgoals.
After every executed instruction we test whether the objective has been achieved. By definition, the time cost per test (measured in unit time steps; Section A.2) equals the number of considered elements of Ds. Here we have an example of a test that may become more expensive with instance size.
We add two more instructions to the initial programming language: instruction 1toD() pushes 1 onto Ds, instruction 2toD() pushes 2. Now we have a total of five task-specific instructions (including those for Hanoi), with instruction numbers 1, 2, 3, 4, 5, for 1toD, 2toD, mvdsk, xSA, xAD, respectively, which gives a total of 73 initial instructions.
So we first boost (Section A.2.3) the ``small number pushers'' c1, c2 (Section A.2.1) for the first training phase where the -th task is to solve all problem instances up to . Then we undo the -specific boosts of c1, c2 and boost instead the Hanoi-specific instruction number pushers for the subsequent training phase where the -th task (again ) is to solve all Hanoi instances up to .