next up previous
Next: Realistic OOPS for Finding Up: Multitasking & Prefix Tracking Previous: Overview of ``Try''

Details of ``Try:'' Bias-Optimal Depth-First Planning in Program Space

To allow us to efficiently undo state changes, we use global Boolean variables $mark_i(r)$ (initially FALSE) for all possible state components $s_i(r)$. We initialize time $t_0 := 0;$ probability $P := 1$; q-pointer $qp := a_{frozen}$ and state $s(r)$ -- including $ip(r)$ and $p(r)$ -- with task-specific information for all task names $r$ in a so-called ring $R_0$ of tasks, where the expression ``ring'' indicates that the tasks are ordered in cyclic fashion; $\mid R \mid$ denotes the number of tasks in ring $R$. Given a global search time limit $T$, we Try to solve all tasks in $R_0$, by using existing code in $q=q_{1:qp}$ and / or by discovering an appropriate prolongation of $q$:

Method 4.1 (BOOLEAN Try ( $qp, r_0, R_0, t_0, P$))   ($r_0 \in R_0$; returns TRUE or FALSE; may have the side effect of increasing $a_{frozen}$ and thus prolonging the frozen code $q_{1:a_{frozen}}$):

1. Make an empty stack $\cal S$; set local variables $r := r_0; R := R_0; t:= t_0;$ Done$:=$ FALSE.

WHILE there are unsolved tasks ( $\mid R \mid > 0$) AND there is enough time left ($t \leq PT$) AND instruction pointer valid ( $-l(s(r)) \leq ip(r) \leq qp$) AND instruction valid ( $1 \leq z(ip(r))(r) \leq n_Q$) AND no halt condition is encountered (e.g., error such as division by 0, or robot bumps into obstacle; evaluate conditions in the above order until first satisfied, if any) DO:

Interpret / execute token $z(ip(r))(r)$ according to the rules of the given programming language, continually increasing $t$ by the consumed time. This may modify $s(r)$ including instruction pointer $ip(r)$ and distribution $p(r)$, but not code $q$. Whenever the execution changes some state component $s_i(r)$ whose $mark_i(r)=$ FALSE, set $mark_i(r) :=$ TRUE and save the previous value $\hat{s}_i(r)$ by pushing the triple $(i, r, \hat{s}_i(r))$ onto $\cal S$. Remove $r$ from $R$ if solved. IF $\mid R \mid > 0$, set $r$ equal to the next task in ring $R$ (like in the round-robin method of standard operating systems). ELSE set Done $:=$ TRUE; $a_{frozen} := qp$ (all tasks solved; new code frozen, if any).

2. Use $\cal S$ to efficiently reset only the modified $mark_i(k)$ to FALSE (the global mark variables will be needed again in step 3), but do not pop $\cal S$ yet.

3. IF $ip(r)=qp+1$ (i.e., if there is an online request for prolongation of the current prefix through a new token): WHILE Done $=$ FALSE and there is some yet untested token $Z \in Q$ (untried since $t_0$ as value for $q_{qp+1}$) DO:

Set $q_{qp+1}:=Z$ and Done $:=$ Try ( $qp+1, r, R, t, P * p(r)(Z)$), where $p(r)(Z)$ is $Z$'s probability according to current distribution $p(r)$.

4. Use $\cal S$ to efficiently restore only those $s_i(k)$ changed since $t_0$, thus restoring all tapes to their states at the beginning of the current invocation of Try. This will also restore instruction pointer $ip(r_0)$ and original search distribution $p(r_0)$. Return the value of Done.

A successful Try will solve all tasks, possibly increasing $a_{frozen}$ and prolonging total code $q$. In any case Try will completely restore all states of all tasks. It never wastes time on recomputing previously computed results of prefixes, or on restoring unmodified state components and marks, or on already solved tasks -- tracking / undoing effects of prefixes essentially does not cost more than their execution. So the $n$ in Def. 2.1 of $n$-bias-optimality is not greatly affected by the undoing procedure: we lose at most a factor 2, ignoring hardware-specific overhead such as the costs of single push and pop operations on a given computer, or the costs of measuring time, etc.

Since the distributions $p(r)$ are modifiable, we speak of self-generated continuation probabilities. As the variable suffix $q' := q_{a_{frozen}+1 : qp}$ of the total code $q=q_{1:qp}$ is growing, its probability can be readily updated:

P(q' \mid s^0)= \prod_{i=a_{frozen}+1}^{qp} P^i(q_i \mid s^i),
\end{displaymath} (1)

where $s^0$ is an initial state, and $P^i(q_i \mid s^i)$ is the probability of $q_i$, given the state $s^i$ of the task $r$ whose variable distribution $p(r)$ (as a part of $s^i$) was used to determine the probability of token $q_i$ at the moment it was selected. So we allow the probability of $q_{qp+1}$ to depend on $q_{0:qp}$ and intial state $s^0$ in a fairly arbitrary computable fashion. Note that unlike the traditional Turing machine-based setup by [31] and [7] (always yielding binary programs $q$ with probability $2^{-l(q)}$) this framework of self-generated continuation probabilities allows for token selection probabilities close to 1.0, that is, even long programs may have high probability.

Example. In many programming languages the probability of token ``('', given a previous token ``WHILE'', equals 1. Having observed the ``('' there is not a lot of new code to execute yet -- in such cases the rules of the programming language will typically demand another increment of instruction pointer ip(r), which will lead to the request of another token through subsequent increment of the topmost code address. However, once we have observed a complete expression of the form ``WHILE (condition) DO (action),'' it may take a long time until the conditional loop -- interpreted via $ip(r)$ -- is exited and the top address is incremented again, thus asking for a new token.

The round robin Try variant above keeps circling through all unsolved tasks, executing one instruction at a time. Alternative Try variants could also sequentially work on each task until it is solved, then try to prolong the resulting $q$ on the next task, and so on, appropriately restoring previous tasks once it turns out that the current task cannot be solved through prolongation of the prefix solving the earlier tasks. One potential advantage of round robin Try is that it will quickly discover whether the currently studied prefix causes an error for at least one task, in which case it can be discarded immediately.

Nonrecursive C-Code. An efficient iterative (nonrecursive) version of Try for a broad variety of initial programming languages was implemented in C. Instead of local stacks $\cal S$, a single global stack is used to save and restore old contents of modified cells of all tapes / tasks.

next up previous
Next: Realistic OOPS for Finding Up: Multitasking & Prefix Tracking Previous: Overview of ``Try''
Juergen Schmidhuber 2004-04-15

Back to OOPS main page