Plugging ALS into the Basic SSA Cycle

Next: Experiment 1: A Big Up: Implementation 1: Plugging LS Previous: Adaptive Levin Search (ALS)

Plugging ALS into the Basic SSA Cycle

Critique of adaptive LS. Although ALS seems a reasonable first step towards making LS adaptive (and actually leads to very nice experimental results -- see section 3.5), there is no proof that it will generate only probability modifications that will speed up the process of finding solutions to new tasks. Like any learning algorithm, ALS may sometimes produce harmful instead of beneficial bias shifts, depending on the environment. To address this issue, we simply plug ALS into the basic cycle from section 2. SSA ensures that the system will keep only probability modifications representing a lifelong history of performance improvements.

ALS as primitive for SSA cycle. At a given time, the learner's current policy is the variable matrix above. To plug ALS into SSA, we replace steps 1 and 3 in section 2's basic cycle by:

1.: If the current basic cycle's problem is , then set Levin search . If a solution was found, generate reward of . Set EVALUATION CRITERION . The next action will be a call of Adapt, which will change the policy .
3.: Push copies of those (the -th column of matrix ) to be modified by Adapt onto $\cal S$ , and call Adapt().

Each call of Adapt causes a bias shift for future learning. In between two calls of Adapt, a certain amount of time will be consumed by Levin search (details about how time is measured will follow in the section on experiments). As always, the goal is to receive as much reward as quickly as possible, by generating policy changes that minimize the computation time required by future calls of Levin search and Adapt.

Partially observable maze problems. The next subsections will describe experiments validating the usefulness of LS, ALS, and SSA. To begin with, in an illustrative application with a partially observable maze that has more states and obstacles than those presented in other POE work (see, e.g., [#!Cliff:94!#]), we will show how LS by itself can solve POEs with large state spaces but low-complexity solutions (Q-learning variants fail to solve these tasks). Then we will present experimental case studies with multiple, more and more difficult tasks (inductive transfer). ALS can use previous experience to speed-up the process of finding new solutions, and ALS plugged into the SSA cycle (SSA+ALS for short) always outperforms ALS by itself.

Next: Experiment 1: A Big Up: Implementation 1: Plugging LS Previous: Adaptive Levin Search (ALS)

Juergen Schmidhuber 2003-02-25

Back to Optimal Universal Search page
Back to Reinforcement Learning page
Back to Program Evolution page