**In general, generalization is impossible.**
Given the history of a particular universe up to a given
time, there are infinitely many possible continuations.
Most of these continuations have nothing to do with the previous history.
To see this, suppose we have observed
a partial history *S*_{k}^{i,j} (the substring between
the *i*-th and the *j*-th symbol of *E*_{k}).
Now we want to generalize from previous
experience to predict
*S*_{k}^{j+1,l}, *l* > *j*.
To do this, we need an algorithm that computes
*S*_{k}^{j+1,l} from *S*_{k}^{i,j}
(*S*_{k}^{i,j} may be stored on a separate, additional
input tape for an appropriate
universal Turing machine).
There are 3^{l - j} possible futures.
But for *c* < *l* - *j*, there are less than
(1/3)^{c} * 3^{l - j} algorithms
with less than *l* - *j* - *c* bits computing such a future,
given *S*_{k}^{i,j}.
Hence in most cases
the shortest algorithm computing the future, given the past,
won't be much shorter than the
shortest algorithm computing the future from nothing.
Both will have about the size of the entire future.
In other words, the mutual algorithmic information between history
and future will be zero.
As a consequence, in most universes (those that can be computed by
long algorithms only), successful generalization from previous experience is
not possible. Neither is inductive transfer.
This simple insight is related to
results in [#!Wolpert:96!#].

**Learning.**
Given the above, since learning means to use previous experience
to improve future performance, learning is possible only in
the few regular universes
(no learning without compressibility).
On the other hand, regularity by itself is not sufficient
to allow for learning. For instance, there is a highly compressible and
regular universe represented
by ``,,,,,,,...''. It is too simple to allow for processes we would
be willing to call learning.

In what follows, I will assume that a regular universe is complex enough to allow for identifying certain permanent data structures of a general learner to be described below. For convenience, I will abstract from bitstring models, and instead talk about environments, rewards, stacks etc. Of course, all these abstract concepts are representable as bitstrings.

**Scenario.**
In general, the learner's life is limited.
To it, time will be important (not to the Great Programmer though).
Suppose its life in environment
lasts from time 0 to unknown time *T*.
In between it repeats the following cycle
over and over again (
denotes a set of possible actions):
select and execute
with probability
,
where the modifiable policy *P* is a variable,
conditional probability distribution on the possible actions,
given current .
Action *a* will consume time and may change
and *P*.
Actions that modify *P* are called primitive learning algorithms (PLAs).
*P* influences the way *P* is modified (``self-modification'').
*Policy modification processes* (PMPs)
are action subsequences that include PLAs.
The *i*-th PMP in system life is denoted *PMP*_{i},
starts at time *s*_{i} > 0, ends at *e*_{i} < *T*, *e*_{i} > *s*_{i},
and computes a sequence of *P*-modifications denoted *M*_{i}.
Both *s*_{i} and *e*_{i} are computed dynamically
by special instructions in
executed according to *P* itself: *P* says when to start and end PMPs.

Occasionally
provides
real-valued reward. The cumulative reward obtained in between
time 0 and time *t* > 0 is denoted by *R*(*t*) (where *R*(0) = 0).
At each PMP-start *s*_{i} the learner's goal is to
use experience to generate *P*-modifications to
accelerate future reward intake.
Assuming that reward acceleration is possible at all,
given *E* and ,
how can the learner achieve it?
I will describe a rather general way of doing so.

**The success-story criterion.**
Each PMP-start time *s*_{i} will trigger an evaluation
of the system's performance so far.
Since *s*_{i} is computed according to *P*,
*P* incorporates information
about when to evaluate itself.
Evaluations may cause
policy modifications to be undone (by restoring
the previous policy -- in practical implementations,
this requires to store previous values of modified
policy components on a stack).
At a given PMP-start *t* in the learner's life,
let *V*(*t*) denot the set of those previous *s*_{i} whose
corresponding *M*_{i} have not
been undone yet. If *V*(*t*) is not empty,
then let
denote the *i*-th such time, ordered according to size.
The success-story criterion
SSC is satisfied if either *V*(*t*) is empty (trivial case) or if

SSC essentially says that each surviving

Related links: In the beginning was the code! - Zuse's thesis - Algorithmic Theories of Everything - Generalized Algorithmic Information - Speed Prior - The New AI