next up previous
Next: OOPS Prerequisites: Multitasking & Up: Bias-Optimal Incremental Problem Solving Previous: Brief Introduction to Optimal


Optimal Ordered Problem Solver (OOPS)

Notation. Unless stated otherwise or obvious, to simplify notation, throughout the paper newly introduced variables are assumed to be integer-valued and to cover the range clear from the context. Given some finite or infinite countable alphabet $Q=\{Q_1,Q_2, \ldots \}$, let $Q^*$ denote the set of finite sequences or strings over $Q$, where $\lambda$ is the empty string. We use the alphabet name's lower case variant to introduce (possibly variable) strings such as $q,q^1,q^2, \ldots \in Q^*$; $l(q)$ denotes the number of symbols in string $q$, where $l(\lambda) = 0$; $q_n$ is the $n$-th symbol of $q$; $q_{m:n}= \lambda$ if $m>n$ and $q_m q_{m+1} \ldots q_n$ otherwise (where $q_0 := q_{0:0} := \lambda$). $q^1q^2$ is the concatenation of $q^1$ and $q^2$ (e.g., if $q^1=abc$ and $q^2=dac$ then $q^1q^2 = abcdac$).

Consider countable alphabets $S$ and $Q$. Strings $s,s^1,s^2, \ldots \in S^*$ represent possible internal states of a computer; strings $q,q^1,q^2, \ldots \in Q^*$ represent code or programs for manipulating states. We focus on $S$ being the set of integers and $Q := \{ 1, 2, \ldots, n_Q \}$ representing a set of $n_Q$ instructions of some programming language (that is, substrings within states may also encode programs).

$R$ is a set of currently unsolved tasks. Let the variable $s(r) \in S^*$ denote the current state of task $r \in R$, with $i$-th component $s_i(r)$ on a computation tape $r$ (think of a separate tape for each task). For convenience we combine current state $s(r)$ and current code $q$ in a single address space, introducing negative and positive addresses ranging from $-l(s(r))$ to $l(q)+1$, defining the content of address $i$ as $z(i)(r) := q_i$ if $0 < i \leq l(q)$ and $z(i)(r) := s_{-i}(r)$ if $-l(s(r)) \leq i \leq 0$. All dynamic task-specific data will be represented at non-positive addresses. In particular, the current instruction pointer ip(r) $:= z(a_{ip}(r))(r)$ of task $r$ can be found at (possibly variable) address $a_{ip}(r) \leq 0$. Furthermore, $s(r)$ also encodes a modifiable probability distribution $p(r) = \{ p_1(r), p_2(r), \ldots, p_{n_Q}(r) \}$ $(\sum_i p_i(r) = 1)$ on $Q$. This variable distribution will be used to select a new instruction in case $ip(r)$ points to the current topmost address right after the end of the current code $q$.

$a_{frozen} \geq 0 $ is a variable address that cannot decrease. Once chosen, the code bias $q_{0:a_{frozen}}$ will remain unchangeable forever -- it is a (possibly empty) sequence of programs $q^1q^2 \ldots$, some of them prewired by the user, others frozen after previous successful searches for solutions to previous tasks. Given $R$, the goal is to solve all tasks $r \in R$, by a program that appropriately uses or extends the current code $q_{0:a_{frozen}}$.

We will do this in a bias-optimal fashion, that is, no solution candidate will get much more search time than it deserves, given some initial probabilistic bias on program space $Q^*$:

Definition 2.1 (BIAS-OPTIMAL SEARCHERS)   Given is a problem class $\cal R$, a search space $\cal C$ of solution candidates (where any problem $r \in \cal R$ should have a solution in $\cal C$), a task-dependent bias in form of conditional probability distributions $P(q \mid r)$ on the candidates $q \in \cal C$, and a predefined procedure that creates and tests any given $q$ on any $r \in \cal R$ within time $t(q,r)$ (typically unknown in advance). A searcher is $n$-bias-optimal ($n \geq 1$) if for any maximal total search time $T_{max} > 0$ it is guaranteed to solve any problem $r \in \cal R$ if it has a solution $p \in \cal C$ satisfying $t(p,r) \leq P(p \mid r)~T_{max}/n$.

Unlike reinforcement learners [4] and heuristics such as Genetic Programming [2], OOPS (section 2.2) will be $n$-bias-optimal, where $n$ is a small and acceptable number, such as 8.



Subsections
next up previous
Next: OOPS Prerequisites: Multitasking & Up: Bias-Optimal Incremental Problem Solving Previous: Brief Introduction to Optimal
Juergen Schmidhuber 2003-02-25


Back to OOPS home page