** Next:** C-Code
** Up:** Experiments: 60 Successive Tasks
** Previous:** Task Representation and Domain-Specific

##

Incremental Learning: First Solve 30
Simpler Context Free Language Tasks

Despite the near-bias-optimality of OOPS,
within reasonable time (a week) on a
personal computer, the system with 71 initial instructions
was not able to solve
instances involving more than 3 disks.
What does this mean? *Since search time of an optimal searcher is
a natural measure of initial bias,* it just means that the already
nonnegligible bias towards our task set was still too weak.

This actually gives us an opportunity to demonstrate that OOPS
can indeed benefit from its incremental learning abilities.
Unlike Levin's and Hutter's *non*incremental methods OOPS
always tries to profit from experience with previous tasks.
Therefore, to properly illustrate its behavior, we need an example where it *does* profit.
In what follows, we will first train it on additional, easier
tasks, to teach it something about recursion, hoping that the resulting
code bias shifts will help to solve the Hanoi tasks as well.

For this purpose we use a seemingly unrelated problem class based on
the context free language : given input
on the data stack *ds*, the goal is to
place symbols on the auxiliary stack *Ds* such that
the topmost elements are 1's followed by 2's.
Again there is no intermediate reward
for achieving instance-specific subgoals.

After every executed instruction we test whether the objective has been achieved.
By definition, the time cost per test (measured in unit time steps; Section A.2)
equals the number of considered elements of *Ds*.
Here we have an example
of a test that may become more expensive with instance size.

We add two more instructions to the initial programming language:
instruction *1toD()* pushes 1 onto *Ds*, instruction *2toD()* pushes 2.
Now we have a total of five task-specific instructions (including those
for Hanoi), with instruction numbers 1, 2, 3, 4, 5, for
*1toD*, *2toD*, *mvdsk*, *xSA*, *xAD*, respectively,
which gives a total of 73 initial instructions.

So we first boost
(Section A.2.3)
the ``small number pushers'' *c1, c2* (Section A.2.1)
for the first training phase where
the -th task
is to solve all problem
instances up to .
Then we undo the -specific boosts of *c1, c2*
and boost instead
the Hanoi-specific instruction number pushers
for the subsequent training phase where
the -th task (again
)
is to solve all Hanoi instances up to .

** Next:** C-Code
** Up:** Experiments: 60 Successive Tasks
** Previous:** Task Representation and Domain-Specific
Juergen Schmidhuber
2004-04-15

Back to OOPS main page