There are many different ways of implementing SSA. Two of them are described in this paper. The first leads to a ``self-referential'' system using assembler-like primitive instructions to modify its own policy -- the system's learning mechanism is embedded within the system, and accessible to self-manipulation. The second implementation leads to a general reinforcement learning algorithm for recurrent nets. Alternatively, however, the PMP from section 1 may be designed to execute arbitrary, conventional or non-conventional learning or search algorithms.

**Other SSA applications.**
In recent work [50,42],
we combine SSA and Levin search
(LS) [18,20]
to solve
partially observable Markov decision problems (POMDPs). POMDPs
received a lot of attention in the reinforcement
learning community.
LS is theoretically optimal for a wide variety of search problems
including many POMDPs. We show that
that LS can solve partially observable mazes (POMs)
involving many more states and obstacles
than those solved by various previous authors (here, LS
also can easily outperform Q-learning). We then note, however,
that LS is not necessarily optimal for ``incremental''
learning problems where experience with previous problems may
help to reduce future search costs. For this reason, we
introduce a heuristic, adaptive extension of LS (ALS) which uses experience
to increase probabilities of instructions occurring in successful
programs found by LS. To deal with cases where ALS does not
lead to long term performance improvement, we use SSA
as a safety belt. Experiments with additional POMs demonstrate:
(a) ALS can dramatically reduce the search time consumed by
successive calls of LS. (b) Additional significant speed-ups
can be obtained by combining ALS and SSA.

In other recent work [53], we use SSA for a multi-agent system with agents much more complex than the ones in section 4. In fact, each agent uses incremental self-improvement as described in section 2. Experiments demonstrate the multi-agent system's effectiveness. For instance, a system consisting of three co-evolving agents chasing each other learns rather sophisticated, stochastic predator and prey strategies. Additional applications of SSA to quite challenging, complex multiagent tasks are described in [41,40].