Optimal Ordered Problem Solver (OOPS)

Notation. Unless stated otherwise or obvious, to simplify notation, throughout the paper newly introduced variables are assumed to be integer-valued and to cover the range clear from the context. Given some finite or infinite countable alphabet $Q=\{Q_1,Q_2, \ldots \}$ , let

denote the set of finite sequences or strings over

, where $\lambda$ is the empty string. We use the alphabet name's lower case variant to introduce (possibly variable) strings such as $q,q^1,q^2, \ldots \in Q^*$ ;

denotes the number of symbols in string

, where $l(\lambda) = 0$ ;

is the

-th symbol of

; $q_{m:n}= \lambda$ if

and $q_m q_{m+1} \ldots q_n$ otherwise (where $q_0 := q_{0:0} := \lambda$ ).

is the concatenation of

and

(e.g., if

and

then

Consider countable alphabets

and

. Strings $s,s^1,s^2, \ldots \in S^*$ represent possible internal states of a computer; strings $q,q^1,q^2, \ldots \in Q^*$ represent code or programs for manipulating states. We focus on

being the set of integers and $Q := \{ 1, 2, \ldots, n_Q \}$ representing a set of

instructions of some programming language (that is, substrings within states may also encode programs).

is a set of currently unsolved tasks. Let the variable $s(r) \in S^*$ denote the current state of task $r \in R$ , with

-th component

on a computation tape

(think of a separate tape for each task). For convenience we combine current state

and current code

in a single address space, introducing negative and positive addresses ranging from

, defining the content of address

if $0 < i \leq l(q)$ and $z(i)(r) := s_{-i}(r)$ if $-l(s(r)) \leq i \leq 0$ . All dynamic task-specific data will be represented at non-positive addresses. In particular, the current instruction pointer ip(r) $:= z(a_{ip}(r))(r)$ of task

can be found at (possibly variable) address $a_{ip}(r) \leq 0$ . Furthermore,

also encodes a modifiable probability distribution $p(r) = \{ p_1(r), p_2(r), \ldots, p_{n_Q}(r) \}$ $(\sum_i p_i(r) = 1)$ on

. This variable distribution will be used to select a new instruction in case

points to the current topmost address right after the end of the current code

$a_{frozen} \geq 0$ is a variable address that cannot decrease. Once chosen, the code bias $q_{0:a_{frozen}}$ will remain unchangeable forever -- it is a (possibly empty) sequence of programs $q^1q^2 \ldots$ , some of them prewired by the user, others frozen after previous successful searches for solutions to previous tasks. Given

, the goal is to solve all tasks $r \in R$ , by a program that appropriately uses or extends the current code $q_{0:a_{frozen}}$ .

We will do this in a bias-optimal fashion, that is, no solution candidate will get much more search time than it deserves, given some initial probabilistic bias on program space

Definition 2.1 (BIAS-OPTIMAL SEARCHERS) Given is a problem class $\cal R$ , a search space $\cal C$ of solution candidates (where any problem $r \in \cal R$ should have a solution in $\cal C$ ), a task-dependent bias in form of conditional probability distributions $P(q \mid r)$ on the candidates $q \in \cal C$ , and a predefined procedure that creates and tests any given

on any $r \in \cal R$ within time

(typically unknown in advance). A searcher is -bias-optimal ( $n \geq 1$ ) if for any maximal total search time $T_{max} > 0$ it is guaranteed to solve any problem $r \in \cal R$ if it has a solution $p \in \cal C$ satisfying $t(p,r) \leq P(p \mid r)~T_{max}/n$ .