The main purpose of this paper was to describe a theoretically sound principle for environment-independent, ``one-way'', single-life reinforcement learning. To illustrate basic aspects of the principle, the remainder of this paper will also present a few experiments. These, however, in no way represent a systematic experimental analysis. In fact, many much more complex experiments are described in other recent papers on this subject, e.g., [50,42,40].
The experiments in the current section demonstrate that the system from section 2 indeed can learn to compute SSMs leading to faster and faster reinforcement intake. The system uses low-level problem-specific instructions in addition to the 17 general, assembler-like instructions mentioned in section 2. The instructions reflect the system's initial (weak) bias. Of course, different problem-specific instructions lead to different initial bias and performance (even the primitive numbers may influence performance, because numbers of executed primitives appear in the corresponding program cells -- later, these numbers may be abused, e.g., as addresses, or as arguments of arithmetic operations). In a given environment, reinforcement intake may be greatly accelerated within a few minutes using one set of instructions. However, the same degree of improvement may require a day of computation time using a different set of instructions. The purpose of this section, however, is not to perform a statistically significant experimental evaluation of the system's initial bias, or to study effects of introducing different kinds of initial bias, or to compare the system to other learning systems with different initial bias. Instead, this section's purpose is to illustrate typical aspects of the system's basic (bias independent) mode of operation. See, e.g., [42,40] for more complex applications.