next up previous
Next: Details of ``Try:'' Bias-Optimal Up: Multitasking & Prefix Tracking Previous: Multitasking & Prefix Tracking


Overview of ``Try''

We assume an initial set of user-defined primitive behaviors reflecting prior knowledge and assumptions of the user. Primitives may be assembler-like instructions or time-consuming software, such as, say, theorem provers, or matrix operators for neural network-like parallel architectures, or trajectory generators for robot simulations, or state update procedures for multiagent systems, etc. Each primitive is represented by a token in Q. It is essential that those primitives whose runtimes are not known in advance can be interrupted by OOPS at any time.

The searcher's initial bias is also affected by initial, user-defined, task-dependent probability distributions on the finite or infinite search space of possible self-delimiting program prefixes. In the simplest case we start with a maximum entropy distribution on the tokens, and define prefix probabilities as the products of the probabilities of their tokens. But prefix continuation probabilities may also depend on previous tokens in context sensitive fashion defined by a probabilistic syntax diagram. In fact, we even permit that any executed prefix assigns a task-dependent, self-computed probability distribution to its own possible suffixes (compare Section 3.1).

Consider the left-hand side of Figure 1. All instruction pointers $ip(r)$ of all current tasks $r$ are initialized by some address, typically below the topmost code address, thus accessing the code bias common to all tasks, and/or using task-specific code fragments written into tapes. All tasks keep executing their instructions in parallel until one of them is interrupted or all tasks are solved, or until some task's instruction pointer points to the yet unused address right after the topmost code address. The latter case is interpreted as a request for code prolongation through a new token, where each token has a probability according to the present task's current state-encoded distribution on the possible next tokens. The deterministic method Try systematically examines all possible code extensions in a depth-first fashion (probabilities of prefixes are just used to order them for runtime allocation). Interrupts and backtracking to previously selected tokens (with yet untested alternatives) and the corresponding partial resets of states and task sets take place whenever one of the tasks encounters an error, or the product of the task-dependent probabilities of the currently selected tokens multiplied by the sum of the runtimes on all tasks exceeds a given total search time limit $T$.

To allow for efficient backtracking, Try tracks effects of tested program prefixes, such as task-specific state modifications (including probability distribution changes) and partially solved task sets, to reset conditions for subsequent tests of alternative, yet untested prefix continuations in an optimally efficient fashion (at most as expensive as the prefix tests themselves).

Since programs are created online while they are being executed, Try will never create impossible programs that halt before all their tokens are read. No program that halts on a given task can be the prefix of another program halting on the same task. It is important to see, however, that in our setup a given prefix that has solved one task (to be removed from the current task set) may continue to demand tokens as it tries to solve other tasks.

Figure 1: Storage snapshot during an OOPS application. Left: general picture (Section 3). Right: language-specific details for a particular FORTH-like programming language (Sections A and 6). Left: current code $q$ ranges from addresses $1$ to $qp$ and includes previously frozen programs $q^1,q^2,q^3$. Three unsolved tasks require three tapes (lower left) with addresses $-3000$ to $0$. Instruction pointers $ip(1)$ and $ip(3)$ point to code in $q$, $ip(2)$ to code on the 2nd tape. Once, say, $ip(3)$ points right above the topmost address $qp$, the probability of the next instruction (at $qp+1$) is determined by the current probability distribution $p(3)$ on the possible tokens. OOPS spends equal time on programs starting with prefix $q_{a_{last}:a_{frozen}}$ (tested only on the most recent task, since such programs solve all previous tasks, by induction), and on all programs starting at $a_{frozen}+1$ (tested on all tasks). Right: Details of a single tape. There is space for several alternative self-made probability distributions on $Q$, each represented by $n_Q$ numerators and a common denominator. The pointer curp determines which distribution to use for the next token request. There is a stack fns of self-made function definitions, each with a pointer to its code's start address, and its numbers of input arguments and return values expected on data stack ds (with stack pointer dp). The dynamic runtime stack cs handles all function calls. Its top entry holds the current instruction pointer ip and the current base pointer into ds below the arguments of the most recent call. There is also space for an auxiliary stack Ds, and for representing modifiable aspects of the environment.
\begin{figure}\centerline{\psfig{figure=storage7.eps,angle=0,height=16cm}}\end{figure}





Figure 2: Search tree during an OOPS application; compare Section 4.1 and Figure 1. Search tree during an OOPS application. The tree branches are program prefixes. A single prefix may modify several task-specific tapes in parallel. Nodes represent primitive instructions; widths of connections between nodes stand for temporary, task-specific transition probabilities encoded (yellow) on the modifiable computation tapes (white, to the right; one tape per task; each tape has an instruction pointer). Prefixes may contain (or call previously frozen) subprograms that dynamically modify the conditional probabilities during runtime, thus rewriting the suffix search procedure. In the example, the currently tested prefix (pink, above the previously frozen codes) consists of the token sequence "while (x /< y) call fn17" (real values in the tree denote transition probabilities). Here fn17 might be a time-consuming instruction, say, for manipulating the arm of a realistic virtual robot. Before requesting an additional token, this prefix may run for a long time, thus changing many components of numerous tapes. Node-oriented backtracking will restore partially solved task sets and modifications of internal states and continuation probabilities once there is an error or the sum of the runtimes of the current prefix on all current tasks exceeds the prefix probability multiplied multiplied by the current search time limit, which keeps doubling until the current task set is solved. See text for details.


next up previous
Next: Details of ``Try:'' Bias-Optimal Up: Multitasking & Prefix Tracking Previous: Multitasking & Prefix Tracking
Juergen Schmidhuber 2004-04-15

Back to OOPS main page