In 1931, Kurt Gödel laid the foundations of theoretical computer science, using elementary arithmetics to build a universal programming language for encoding arbitrary proofs, given an arbitrary enumerable set of axioms. He went on to construct self-referential formal statements that claim their own unprovability, using Cantor's diagonalization trick [5] to demonstrate that formal systems such as traditional mathematics are either flawed in a certain sense or contain unprovable but true statements [11]. Since Gödel's exhibition of the fundamental limits of proof and computation, and Konrad Zuse's subsequent construction of the first working programmable computer (1935-1941), there has been a lot of work on specialized algorithms solving problems taken from more or less general problem classes. Apparently, however, one remarkable fact has so far escaped the attention of computer scientists: it is possible to use self-referential proof systems to build optimally efficient yet conceptually very simple universal problem solvers.
The initial software of our Gödel machine runs an initial, typically sub-optimal problem solver, e.g., one of Hutter's approaches [16,17] which have at least an optimal order of complexity, or some less general method [20]. Simultaneously, it runs an -optimal initial proof searcher using an online variant of Universal Search to test proof techniques, which are programs able to compute proofs concerning the system's own future performance, based on an axiomatic system encoded in , describing a formal utility function , the hardware and itself. If there is no provably good, globally optimal way of rewriting at all, then humans will not find one either. But if there is one, then itself can find and exploit it. This approach yields the first class of theoretically sound, fully self-referential, optimally efficient, general problem solvers.
After the theoretical discussion in Sections 1-5, one practical question remains: to build a particular, especially practical Gödel machine with small initial constant overhead, which generally useful theorems should one add as axioms to (as initial bias) such that the initial searcher does not have to prove them from scratch?