In general, generalization is impossible. Given the history of a particular universe up to a given time, there are infinitely many possible continuations. Most of these continuations have nothing to do with the previous history. To see this, suppose we have observed a partial history Ski,j (the substring between the i-th and the j-th symbol of Ek). Now we want to generalize from previous experience to predict Skj+1,l, l > j. To do this, we need an algorithm that computes Skj+1,l from Ski,j (Ski,j may be stored on a separate, additional input tape for an appropriate universal Turing machine). There are 3l - j possible futures. But for c < l - j, there are less than (1/3)c * 3l - j algorithms with less than l - j - c bits computing such a future, given Ski,j. Hence in most cases the shortest algorithm computing the future, given the past, won't be much shorter than the shortest algorithm computing the future from nothing. Both will have about the size of the entire future. In other words, the mutual algorithmic information between history and future will be zero. As a consequence, in most universes (those that can be computed by long algorithms only), successful generalization from previous experience is not possible. Neither is inductive transfer. This simple insight is related to results in [#!Wolpert:96!#].
Learning. Given the above, since learning means to use previous experience to improve future performance, learning is possible only in the few regular universes (no learning without compressibility). On the other hand, regularity by itself is not sufficient to allow for learning. For instance, there is a highly compressible and regular universe represented by ``,,,,,,,...''. It is too simple to allow for processes we would be willing to call learning.
In what follows, I will assume that a regular universe is complex enough to allow for identifying certain permanent data structures of a general learner to be described below. For convenience, I will abstract from bitstring models, and instead talk about environments, rewards, stacks etc. Of course, all these abstract concepts are representable as bitstrings.
Scenario. In general, the learner's life is limited. To it, time will be important (not to the Great Programmer though). Suppose its life in environment lasts from time 0 to unknown time T. In between it repeats the following cycle over and over again ( denotes a set of possible actions): select and execute with probability , where the modifiable policy P is a variable, conditional probability distribution on the possible actions, given current . Action a will consume time and may change and P. Actions that modify P are called primitive learning algorithms (PLAs). P influences the way P is modified (``self-modification''). Policy modification processes (PMPs) are action subsequences that include PLAs. The i-th PMP in system life is denoted PMPi, starts at time si > 0, ends at ei < T, ei > si, and computes a sequence of P-modifications denoted Mi. Both si and ei are computed dynamically by special instructions in executed according to P itself: P says when to start and end PMPs.
Occasionally provides real-valued reward. The cumulative reward obtained in between time 0 and time t > 0 is denoted by R(t) (where R(0) = 0). At each PMP-start si the learner's goal is to use experience to generate P-modifications to accelerate future reward intake. Assuming that reward acceleration is possible at all, given E and , how can the learner achieve it? I will describe a rather general way of doing so.
The success-story criterion.
Each PMP-start time si will trigger an evaluation
of the system's performance so far.
Since si is computed according to P,
P incorporates information
about when to evaluate itself.
Evaluations may cause
policy modifications to be undone (by restoring
the previous policy -- in practical implementations,
this requires to store previous values of modified
policy components on a stack).
At a given PMP-start t in the learner's life,
let V(t) denot the set of those previous si whose
corresponding Mi have not
been undone yet. If V(t) is not empty,
then let
denote the i-th such time, ordered according to size.
The success-story criterion
SSC is satisfied if either V(t) is empty (trivial case) or if