The sections above treated the case of passive prediction, given the
observations. Note, however, that agents interacting with an environment
can also use predictions of the future to compute action sequences
that maximize expected future reward. Hutter's AIXI model
[10] does exactly this, by combining Solomonoff's
-based universal prediction scheme with an expectimax
computation. It can be shown that the conditional
probability of
environmental
inputs to an AIXI agent, given the agent's earlier inputs and actions,
converges with increasing length of interaction against the true, unknown
probability [10],
as long as the latter is recursively computable, analogously
to the passive prediction case.
We can modify the AIXI model such that its predictions are based on the
-approximable Speed Prior
instead of the incomputable
. Thus we obtain
the so-called AIS model. Using Hutter's approach [10]
we can now show that the conditional
probability of environmental inputs to an AIS agent, given the earlier
inputs and actions, converges against the true but unknown probability,
as long as the latter is dominated by
, such as the
in subsection
4.