Proof. Target theorem (2) implicitly talks about all the other switchprogs that the proof searcher could produce in the future. To see this, consider the two alternatives of the binary decision: (1) either execute the current switchprog (set switchbit ), or (2) keep searching for proofs and switchprogs (set switchbit ) until the systematic searcher comes up with an even better switchprog. Obviously the second alternative concerns all (possibly infinitely many) potential switchprogs to be considered later. That is, if the current switchprog were not the `best', then the proof searcher would not be able to prove that setting switchbit and executing switchprog will cause higher expected reward than discarding switchprog, assuming consistency of . Q.E.D.
The initial proof searcher of Section 5 already generates all possible proofs and switchprogs in -optimal fashion. Nevertheless, since it is part of , its proofs can speak about the proof searcher itself, possibly triggering proof searcher rewrites resulting in better than merely -optimal performance.