next up previous
Next: Physics Up: Machine Dependence / Suboptimal Previous: Relation to a Popular

Rational Decision Makers Based on Universal Predictors

The sections above treated the case of passive prediction, given the observations. Note, however, that agents interacting with an environment can also use predictions of the future to compute action sequences that maximize expected future reward. Hutter's AIXI model [10] does exactly this, by combining Solomonoff's $M$-based universal prediction scheme with an expectimax computation. It can be shown that the conditional $M$ probability of environmental inputs to an AIXI agent, given the agent's earlier inputs and actions, converges with increasing length of interaction against the true, unknown probability [10], as long as the latter is recursively computable, analogously to the passive prediction case.

We can modify the AIXI model such that its predictions are based on the $\epsilon$-approximable Speed Prior $S$ instead of the incomputable $M$. Thus we obtain the so-called AIS model. Using Hutter's approach [10] we can now show that the conditional $S$ probability of environmental inputs to an AIS agent, given the earlier inputs and actions, converges against the true but unknown probability, as long as the latter is dominated by $S$, such as the $S'$ in subsection 4.


next up previous
Next: Physics Up: Machine Dependence / Suboptimal Previous: Relation to a Popular
Juergen Schmidhuber 2003-02-25

Back to Speed Prior page