next up previous
Next: SSA Calls Up: Appendix Previous: Appendix

Basic Cycle of Operations

Until unknown time $T$ (system death), the system repeats the following basic instruction cycle over and over.

Select instruction head $a_j \in I$ with probability $Q(IP,j)$, where

Q(i,j) = \frac{f({\sc Right}_{i,j}, {\sc Left }_{i,j})} {\sum_k
f({\sc Right}_{i,k},{\sc Left }_{i,k})},

for~ i \in \{0, \ldots,
m-1 \}, j \in \{0, \ldots, n-1 \}.

Here the collective decision function $f(x,y)$ maps real-valued $x,y$ to real values. Given an appropriate $f$, each module may ``veto'' instructions suggested by the other module. Only instructions that are strongly supported by both modules are highly likely to be selected. One possibility is $f(x,y) =
min(x,y)$. In the experiments I use $f(x,y) = xy$.

Comment: owing to pecularities of certain instructions to be introduced below, $Q(i,j)$ will later be refined for cases where $i$ addresses an instruction head as opposed to an argument.

$a_j$'s $n_j \leq 6$ arguments $\in \{ 0, \ldots, n-1 \}$ are selected according to probability distributions $Q(IP + 1, .),
Q(IP + 2, .), \ldots, Q(IP + n_j, .)$ (except when $a_j = $ Bet! -- two of Bet!'s arguments will be treated differently -- see Section A.3.4 below).

Execute the selected instruction. This will consume time and may change (1) environment $\cal E$, (2) IP, (3) internal state $\cal S$; (4a) ${\sc Right}$, (4b) ${\sc Left }$. If there is external reward $R$ then set $\cal S$ $_{8} \leftarrow R$ (rewards become visible to the system in the form of inputs).

If an input has changed one of the cell contents $\cal S$$_{0}$, $\cal S$$_{1}, \ldots,$$\cal S$$_{8}$, then shift the contents of $\cal S$$_{0}$, $\cal S$$_{1}$,$\ldots$,$\cal S$$_{80}$ to components $\cal S$$_{9}$, $\cal S$$_{10}$,$\ldots$,$\cal S$$_{89}$, respectively. This results in a built-in short-term memory (long-term memory can be implemented by the system itself by executing appropriate instruction sequences).

If $a_j$ did not modify IP (no conditional jump -- compare instruction list below), then compute the address of the next instruction head by setting IP $\leftarrow w - w~mod~BS $. Here $ w = (w_1 n + w_2)~mod~m$, where $ w_1 \in \{0, \ldots, n-1
\} $ is selected according to probability distribution $Q(IP + 7,
.)$, while $ w_2 \in \{0, \ldots, n-1 \} $ is selected according to $Q(IP + 8, .)$.

Goto 1.

next up previous
Next: SSA Calls Up: Appendix Previous: Appendix
Juergen Schmidhuber 2003-03-10

Back to Active Learning - Exploration - Curiosity page