Next: Details of ``Try:'' Bias-Optimal
Up: Multitasking & Prefix Tracking
Previous: Multitasking & Prefix Tracking
Overview of ``Try''
We assume an initial set of user-defined primitive behaviors
reflecting prior knowledge and assumptions of the user.
Primitives may be assembler-like instructions or
time-consuming software, such as, say, theorem provers,
or matrix operators for neural network-like parallel architectures,
or trajectory generators for robot simulations,
or state update procedures for multiagent systems,
etc. Each primitive is represented by a token
in Q.
It is essential that those primitives
whose runtimes are not known in advance
can be interrupted by OOPS at any time.
The searcher's initial bias is also affected
by initial, user-defined, task-dependent probability
distributions on the finite or infinite search space of
possible self-delimiting program prefixes. In the simplest case we start
with a maximum entropy distribution on the tokens, and
define prefix probabilities as the products of the
probabilities of their tokens.
But prefix continuation probabilities may
also depend on previous tokens in
context sensitive fashion defined by a probabilistic
syntax diagram. In fact, we even permit that any executed prefix
assigns a task-dependent, self-computed
probability distribution to its own possible suffixes
(compare Section 3.1).
Consider the left-hand side of Figure 1.
All instruction pointers of all
current tasks are initialized by some address,
typically below the topmost code address,
thus accessing the code bias common to all tasks,
and/or using task-specific code fragments written
into tapes.
All tasks keep executing their instructions in parallel
until one of them is interrupted or all tasks are solved,
or until some task's instruction pointer points to the yet unused
address right after the
topmost code address. The latter case
is interpreted as a request for code prolongation through a new token,
where each token has a probability
according to the present task's current state-encoded
distribution on the possible next tokens.
The deterministic method Try systematically
examines all possible code extensions in a depth-first fashion
(probabilities of prefixes are just used to order them for runtime
allocation).
Interrupts and backtracking to previously selected tokens
(with yet untested alternatives) and the corresponding partial resets
of states and task sets
take place whenever one of the tasks encounters an error,
or the product of the task-dependent probabilities of the currently selected tokens
multiplied by the sum of the runtimes on all tasks exceeds a given
total search time limit .
To allow for efficient backtracking,
Try tracks effects of tested program prefixes,
such as task-specific state modifications (including probability distribution changes)
and partially solved task sets, to reset
conditions for subsequent tests of alternative, yet untested
prefix continuations in an optimally efficient fashion
(at most as expensive as the prefix tests themselves).
Since programs are created online while they are being executed,
Try will never create impossible
programs that halt before all their tokens are read.
No program that halts on a given task can be
the prefix of another program halting on the same task.
It is important to see, however, that
in our setup a given prefix that has solved one task
(to be removed from the current task set)
may continue to demand tokens as it tries to solve other tasks.
Figure 1:
Storage snapshot during an OOPS application.
Left: general picture (Section 3). Right: language-specific
details for a particular
FORTH-like programming language (Sections A and 6).
Left:
current code ranges from addresses to and includes
previously frozen programs .
Three unsolved tasks require three tapes (lower left) with addresses to .
Instruction pointers and point to code in , to code
on the 2nd tape.
Once, say, points right above the topmost address ,
the probability of the next instruction (at )
is determined by the current probability
distribution on the possible tokens.
OOPS spends equal time on programs
starting with prefix
(tested only on the most recent task, since such programs
solve all previous tasks, by induction),
and on all programs starting at (tested on all tasks).
Right: Details of a single tape.
There is space for several alternative self-made probability distributions on ,
each represented by numerators and a common denominator.
The pointer curp
determines
which distribution to use for the next token request.
There is a stack fns of self-made function definitions, each with a pointer to
its code's start address, and its numbers of input arguments and return values expected
on data stack ds (with stack pointer dp). The
dynamic runtime
stack cs
handles all function calls.
Its top
entry holds the current instruction pointer ip and
the current base pointer into ds below the arguments of the most recent
call. There is also space for an auxiliary stack Ds, and for
representing modifiable aspects of the environment.
|
Figure 2:
Search tree during an OOPS application; compare Section 4.1 and
Figure 1.
Search tree during an OOPS application. The tree branches are program prefixes. A single prefix may modify several task-specific tapes in parallel. Nodes represent primitive instructions; widths of connections between nodes stand for temporary, task-specific transition probabilities encoded (yellow) on the modifiable computation tapes (white, to the right; one tape per task; each tape has an instruction pointer). Prefixes may contain (or call previously frozen) subprograms that dynamically modify the conditional probabilities during runtime, thus rewriting the suffix search procedure. In the example, the currently tested prefix (pink, above the previously frozen codes) consists of the token sequence "while (x /< y) call fn17" (real values in the tree denote transition probabilities). Here fn17 might be a time-consuming instruction, say, for manipulating the arm of a realistic virtual robot. Before requesting an additional token, this prefix may run for a long time, thus changing many components of numerous tapes. Node-oriented backtracking will restore partially solved task sets and modifications of internal states and continuation probabilities once there is an error or the sum of the runtimes of the current prefix on all current tasks exceeds the prefix probability multiplied multiplied by the current search time limit, which keeps doubling until the current task set is solved.
See text for details.
Next: Details of ``Try:'' Bias-Optimal
Up: Multitasking & Prefix Tracking
Previous: Multitasking & Prefix Tracking
Juergen Schmidhuber
2004-04-15
Back to OOPS main page