What is the optimal way of predicting the future, given the past? Which is the best way to act such as to maximize one's future expected reward? Which is the best way of searching for the solution to a novel problem, making optimal use of solutions to earlier problems?
Most previous work on these old and fundamental questions has focused on very limited settings, such as Markovian environments where the optimal next action, given past inputs, depends on the current input only [27].
We will concentrate on a much weaker and therefore much more general assumption, namely, that the environment's responses are sampled from a computable probability distribution. If even this weak assumption were not true then we could not even formally specify the environment, leave alone writing reasonable scientific papers about it.
Let us first introduce some notation. denotes the set of finite sequences over the binary alphabet , the set of infinite sequences over , the empty string, . stand for strings in . If then is the concatenation of and (e.g., if and then ). For , denotes the number of bits in , where for ; . is the prefix of consisting of the first bits, if , and otherwise ( ). denotes the logarithm with basis 2, denote functions mapping integers to integers. We write if there exist positive constants such that for all . For simplicity let us consider universal Turing Machines [64] (TMs) with input alphabet and trinary output alphabet including the symbols ``0'', ``1'', and `` '' (blank). For efficiency reasons, the TMs should have several work tapes to avoid potential quadratic slowdowns associated with 1-tape TMs. The remainder of this paper assumes a fixed universal reference TM.
Now suppose bitstring represents the data observed so far.
What is its most likely continuation
? Bayes' theorem yields