``Only math nerds would consider finite.''
HSEARCH and LSEARCH (Sections 2.2, 2.3) neglect one potential source of speed-up: they are nonincremental in the sense that they do not attempt to minimize their constant slowdowns by exploiting experience collected in previous searches for solutions to earlier tasks. They simply ignore the constants -- from an asymptotic point of view, incremental search does not buy anything.
A heuristic attempt [64,65] to greatly reduce the constants through experience was called Adaptive LSEARCH or ALS -- compare related ideas by Solomonoff ([69,70]). Essentially ALS works as follows: whenever LSEARCH finds a program that computes a solution for the current problem, 's probability is substantially increased using a ``learning rate,'' while probabilities of alternative programs decrease appropriately. Subsequent LSEARCHes for new problems then use the adjusted , etc.  and  used a nonuniversal variant of this approach to solve reinforcement learning (RL) tasks in partially observable environments unsolvable by traditional RL algorithms.
Each LSEARCH invoked by ALS is bias-optimal with respect to the most recent adjustment of . On the other hand, the rather arbitrary -modifications themselves are not necessarily optimal. They might lead to overfitting in the following sense: modifications of after the discovery of a solution to problem 1 could actually be harmful and slow down the search for a solution to problem 2, etc. This may provoke a loss of near-bias-optimality with respect to the initial bias during exposure to subsequent tasks. Furthermore, ALS has a fixed prewired method for changing and cannot improve this method by experience. The main contribution of this paper is to overcome all such drawbacks in a principled way.