Critique of adaptive LS. Although ALS seems a reasonable first step towards making LS adaptive (and actually leads to very nice experimental results -- see section 3.5), there is no proof that it will generate only probability modifications that will speed up the process of finding solutions to new tasks. Like any learning algorithm, ALS may sometimes produce harmful instead of beneficial bias shifts, depending on the environment. To address this issue, we simply plug ALS into the basic cycle from section 2. SSA ensures that the system will keep only probability modifications representing a lifelong history of performance improvements.
ALS as primitive for SSA cycle. At a given time, the learner's current policy is the variable matrix above. To plug ALS into SSA, we replace steps 1 and 3 in section 2's basic cycle by:
Each call of Adapt causes a bias shift for future learning. In between two calls of Adapt, a certain amount of time will be consumed by Levin search (details about how time is measured will follow in the section on experiments). As always, the goal is to receive as much reward as quickly as possible, by generating policy changes that minimize the computation time required by future calls of Levin search and Adapt.
Partially observable maze problems. The next subsections will describe experiments validating the usefulness of LS, ALS, and SSA. To begin with, in an illustrative application with a partially observable maze that has more states and obstacles than those presented in other POE work (see, e.g., [#!Cliff:94!#]), we will show how LS by itself can solve POEs with large state spaces but low-complexity solutions (Q-learning variants fail to solve these tasks). Then we will present experimental case studies with multiple, more and more difficult tasks (inductive transfer). ALS can use previous experience to speed-up the process of finding new solutions, and ALS plugged into the SSA cycle (SSA+ALS for short) always outperforms ALS by itself.