There are many different ways of implementing SSA. Two of them are described in this paper. The first leads to a ``self-referential'' system using assembler-like primitive instructions to modify its own policy -- the system's learning mechanism is embedded within the system, and accessible to self-manipulation. The second implementation leads to a general reinforcement learning algorithm for recurrent nets. Alternatively, however, the PMP from section 1 may be designed to execute arbitrary, conventional or non-conventional learning or search algorithms.
Other SSA applications. In recent work [50,42], we combine SSA and Levin search (LS) [18,20] to solve partially observable Markov decision problems (POMDPs). POMDPs received a lot of attention in the reinforcement learning community. LS is theoretically optimal for a wide variety of search problems including many POMDPs. We show that that LS can solve partially observable mazes (POMs) involving many more states and obstacles than those solved by various previous authors (here, LS also can easily outperform Q-learning). We then note, however, that LS is not necessarily optimal for ``incremental'' learning problems where experience with previous problems may help to reduce future search costs. For this reason, we introduce a heuristic, adaptive extension of LS (ALS) which uses experience to increase probabilities of instructions occurring in successful programs found by LS. To deal with cases where ALS does not lead to long term performance improvement, we use SSA as a safety belt. Experiments with additional POMs demonstrate: (a) ALS can dramatically reduce the search time consumed by successive calls of LS. (b) Additional significant speed-ups can be obtained by combining ALS and SSA.
In other recent work , we use SSA for a multi-agent system with agents much more complex than the ones in section 4. In fact, each agent uses incremental self-improvement as described in section 2. Experiments demonstrate the multi-agent system's effectiveness. For instance, a system consisting of three co-evolving agents chasing each other learns rather sophisticated, stochastic predator and prey strategies. Additional applications of SSA to quite challenging, complex multiagent tasks are described in [41,40].