In what follows, I will briefly describe earlier work and the train of thought leading to this paper.
Meta-evolution. My first attempts to come up with schemes for ``true''3self-referential learning based on universal languages date back to 1987. They were partly inspired by a collaboration with Dickmanns and Winklhofer . We used a genetic algorithm (GA) to evolve variable length programs for solving simple tasks (the system was implemented in PROLOG). Today, this approach would be classified as ``Genetic Programming'', e.g. .  was in fact one of the first two papers on using GP-like algorithms to evolve assembler-like computer programs (but Cramer's work preceded ours ). We applied our system to simple tasks, including the ``lawnmower problem'' (later also studied by Koza, 1994). Since our programming language was general (for instance, we allowed for programs with loops etc.), our approach had at least the same potential as the one based on Koza's so-called ``automatically defined functions''. However, in subsequent work (1987 -- 1995), we found GP unsatisfactory. One reason was: GP's way of constructing new code from old code does not improve itself -- it always remains limited to the initial crossover and mutation mechanisms. This created a desire to improve the trivial mutation and crossover strategies used to construct new programs from old ones. In 1987, I developed an algorithmic scheme (called ``meta-evolution'') for letting more sophisticated strategies be learned by a potentially infinite hierarchy of higher level GAs whose domains were to construct construction strategies. Meta-evolution recursively creates a growing hierarchy of pools of programs -- higher-level pools containing program modifying programs being applied to lower-level programs and being rewarded based on lower-level performance. Details in .
Collapsing meta-levels. The explicit creation of ``meta-levels'' and ``meta-meta-levels'' seemed unnatural, however. For this reason, alternative systems based on ``self-referential'' languages were explored, the goal being to collapse all meta-levels into one . At that time, however, no convincing global credit assignment strategy was provided.
Self-referential neural nets. Later work presented a neural network with the potential to run its own weight change algorithm [30,29]. With this system, top-level credit assignment is performed by gradient descent. This is unsatisfactory, however, due to problems with local minima, and because repeatable training sequences are required. In general, this makes it impossible to take the entire learning history into account.
Algorithmic probability / Universal search. Levin's universal search algorithm is theoretically optimal for certain ``non-incremental'' search tasks with exactly repeatable initial conditions. See Levin [19,20]; see also Adleman . There were a few attempts to extend universal search to incremental learning situations, where previous ``trials'' may provide information about how to speed up further learning, see e.g. [46,22,31]. For instance, to improve future performance, Solomonoff [45,46] describes more traditional (as opposed to self-improving) methods for assigning probabilities to successful ``subprograms''. Alternatively, one of the actually implemented systems in  simply keeps successful code in its program area. This system was a conceptual starting point for the one in the current paper. With first attempts (in September 1994), the probability distributions underlying the Turing machine equivalent language required for universal search were modified heuristically. One strategy was to slightly increase the context-dependent probabilities of program cell contents used in successful programs, and then continue universal search based on the new probability distributions. With a number of experiments, this actually led to good results (at first glance, more impressive results than those in the current paper, at least if one does not take the lack of bias into account, as one should always do). The system, however, was unsatisfactory, precisely because there was no principled way of adjusting probability distributions. This criticism led to the ideas expressed in the current paper.
Meta-version of universal search. Without going into details, Solomonoff  mentions that self-improvement may be formulated as a time-limited optimization problem, thus being solvable by universal search. However, the straight-forward meta-version of universal search (generating and evaluating probability distributions in order of their Levin complexities ) just defers the credit assignment problem to the meta-level, and does not necessarily make optimal incremental use of computational resources and previous experience4. In fact, just like exhaustive search, but unlike SSA, universal search by itself cannot properly deal with changing environments. However, variants of universal search may be used as the parameter modification algorithms executed by PMPs (see section 1 and recent work with Marco Wiering [50,42]).