next up previous
Next: A Particular Initial Programming Up: Optimal Ordered Problem Solver Previous: OOPS Prerequisites: Multitasking &

OOPS For Finding Universal Solvers

Now suppose there is an ordered sequence of tasks $r_1, r_2, \ldots$. Task $r_j$ may or may not depend on solutions for $r_i$ $(i,j=1,2,\ldots, j>i).$ For instance, task $r_j$ may be to find a faster way through a maze than the one found during the search for a solution to task $r_{j-1}$.

We are searching for a single program solving all tasks encountered so far (see [9] for variants of this setup). Inductively suppose we have solved the first $n$ tasks through programs stored below address $a_{frozen}$, and that the most recently found program starting at address $a_{last} \leq a_{frozen}$ actually solves all of them, possibly using information conveyed by earlier programs. To find a program solving the first $n+1$ tasks, OOPS invokes Try as follows (using set notation for ring $R$):

Method 2.2 (OOPS (n+1))   Initialize $T := 2;$ $qp := a_{frozen}$.

1. Set $R = \{ r_{n+1} \}$ and $ip(r_{n+1}) := a_{last}$. IF Try ( $qp, r_{n+1}, R, 0, 0.5$) then exit.

2. IF $n+1 > T$ go to 3. Set $R = \{ r_1, r_2, \ldots, r_{n+1} \}$; set local variable $a := a_{frozen} + 1$; for all $r \in R$ set $ip(r) := a$. IF Try ( $qp, r_{n+1}, R, 0, 0.5$) set $a_{last} := a$ and exit.

3. Set $T := 2 T$, and go to 1.

That is, we spend roughly equal time on two simultaneous searches. The second (step 2) considers all tasks and all prefixes. The first (step 1), however, focuses only on task $n+1$ and the most recent prefix and its possible continuations. In particular, start address $a_{last}$ does not increase as long as new tasks can be solved by prolonging $q_{a_{last}:a_{frozen}}$. Why is this justified? A bit of thought shows that it is impossible for the most recent code starting at $a_{last}$ to request any additional tokens that could harm its performance on previous tasks. We already inductively know that all of its prolongations will solve all tasks up to $n$.

Therefore, given tasks $r_1, r_2, \ldots,$ we first initialize $a_{last}$; then for $i := 1, 2, \ldots $ invoke OOPS$(i)$ to find programs starting at (possibly increasing) address $a_{last}$, each solving all tasks so far, possibly eventually discovering a universal solver for all tasks in the sequence. As address $a_{last}$ increases for the $n$-th time, $q^n$ is defined as the program starting at $a_{last}$'s old value and ending right before its new value. Clearly, $q^m$ ($m>n$) may exploit $q^n$.

Optimality. OOPS not only is asymptotically optimal in Levin's sense [6] (see Method 1.1), but also near bias-optimal (Def. 2.1). To see this, consider a program $p$ solving problem $r_j$ within $k$ steps, given current code bias $q_{0:a_{frozen}}$ and $a_{last}$. Denote $p$'s probability by $P(p)$. A bias-optimal solver would solve $r_j$ within at most $k/P(p)$ steps. We observe that OOPS will solve $r_j$ within at most $2^3k/P(p)$ steps, ignoring overhead: a factor 2 might get lost for allocating half the search time to prolongations of the most recent code, another factor 2 for the incremental doubling of $T$ (necessary because we do not know in advance the best value of $T$), and another factor 2 for Try's resets of states and tasks. So the method is 8-bias-optimal (ignoring hardware-specific overhead) with respect to the current task.

Our only bias shifts are due to freezing programs once they have solved a problem. That is, unlike the learning rate-based bias shifts of ADAPTIVE LSEARCH [10], those of OOPS do not reduce probabilities of programs that were meaningful and executable before the addition of any new $q^i$. Only formerly meaningless, interrupted programs trying to access code for earlier solutions when there weren't any suddenly may become prolongable and successful, once some solutions to earlier tasks have been stored.

Hopefully we have $P(p) >> P(p')$, where $p'$ is among the most probable fast solvers of $r_j$ that do not use previously found code. For instance, $p$ may be rather short and likely because it uses information conveyed by earlier found programs stored below $a_{frozen}$. E.g., $p$ may call an earlier stored $q^i$ as a subprogram. Or maybe $p$ is a short and fast program that copies $q^i$ into state $s(r_j)$, then modifies the copy just a little bit to obtain $\bar{q}^i$, then successfully applies $\bar{q}^i$ to $r_j$. If $p'$ is not many times faster than $p$, then OOPS will in general suffer from a much smaller constant slowdown factor than LSEARCH, reflecting the extent to which solutions to successive tasks do share useful mutual information.

Unlike nonincremental LSEARCH and HSEARCH, which do not require online-generated programs for their aymptotic optimality properties, OOPS does depend on such programs: The currently tested prefix may temporarily rewrite the search procedure by invoking previously frozen code that redefines the probability distribution on its suffixes, based on experience ignored by LSEARCH & HSEARCH (metasearching & metalearning!).

As we are solving more and more tasks, thus collecting and freezing more and more $q^i$, it will generally become harder and harder to identify and address and copy-edit particular useful code segments within the earlier solutions. As a consequence we expect that much of the knowledge embodied by certain $q^j$ actually will be about how to access and edit and use programs $q^i$ ($i<j$) previously stored below $q^j$.

next up previous
Next: A Particular Initial Programming Up: Optimal Ordered Problem Solver Previous: OOPS Prerequisites: Multitasking &
Juergen Schmidhuber 2003-02-25

Back to OOPS home page