As mentioned above, LS is not necessarily optimal for ``incremental'' learning problems where experience with previous problems may help to reduce future search costs. To make an incremental search method out of non-incremental LS, we introduce a simple, heuristic, adaptive LS extension (ALS) that uses experience with previous problems to adaptively modify LS' underlying probability distribution. ALS essentially works as follows: whenever LS found a program that computed a solution for the current problem, the probabilities of 's instructions are increased (here denotes 's -th instruction, and denotes 's length -- if LS did not find a solution ( is the empty program), then is defined to be 0). The probability adjustment is controlled by a learning rate ( ). ALS is related to the linear reward-inaction algorithm (e.g., [Kaelbling, 1993]) -- the main difference is: ALS uses LS to search through program space as opposed to single action space. As in section 2, the probability distribution is determined by . Initially, all . However, given a sequence of problems , the may undergo changes caused by ALS:
ALS (problems , variable matrix )
for := 1 to do:
:= Levin search(, ); Adapt(, ).
Adapt(program , variable matrix )
for := 1 to , := 1 to do:
if ( = ) then
Critique of adaptive LS. Although ALS seems a reasonable first step towards making LS adaptive (and actually leads to very nice experimental results -- see section 5), there is no theoretical proof that it will generate only probability modifications that will speed up the process of finding solutions to new tasks - sometimes ALS may produce harmful instead of beneficial results. To address this issue, in the next section we augment ALS by a recent backtracking technique called ``Environment-Independent Reinforcement Acceleration'' (EIRA). EIRA ensures that the system will keep only probability modifications representing a lifelong history of performance improvements.