Next: Acknowledgments Up: Shifting Inductive Bias with Previous: Other Experiments with IS:

Conclusion

SSA collects more information than previous RL schemes about long-term effects of policy changes and shifts of inductive bias. In contrast to traditional RL approaches, time is not reset at trial boundaries. Instead we measure the total reward received and the total time consumed by learning and policy tests during all trials following some bias shift: bias shifts are evaluated by measuring their long-term effects on later learning. Bias shifts are undone once there is empirical evidence that they have not set the stage for long-term performance improvement. No bias shift is safe forever, but in many regular environments the survival probabilities of useful bias shifts will approach unity if they can justify themselves by contributing to long-term reward accelerations.

Limitations. (1) Like any approach to inductive transfer ours suffers from the fundamental limitations mentioned in the first paragraph of this paper. (2) Especially in the beginning of the training phase ALS may suffer from a possibly large constant buried in the notation used to describe LS' optimal order of complexity. (3) We do not gain much by applying our methods to, say, simple ``Markovian'' mazes for which there already are efficient RL methods based on dynamic programming (our methods are of interest, however, in certain more realistic situations where standard RL methods fail). (4) SSA does not make much sense in ``unfriendly'' environments in which reward constantly decreases no matter what the learner does. In such environments SSC will be satisfiable only in a trivial way. True success stories will be possible only in ``friendly'', regular environments that do allow for long-term reward speed-ups (this does include certain zero sum reward games though).

Outlook. Despite these limitations we feel that we have barely scratched SSA's potential for solving realistic RL problems involving inductive transfer. In future work we intend to plug a whole variety of well-known algorithms into SSA, and let it pick and combine the best, problem-specific ones.

Next: Acknowledgments Up: Shifting Inductive Bias with Previous: Other Experiments with IS:

Juergen Schmidhuber 2003-02-25

Back to Optimal Universal Search page
Back to Reinforcement Learning page
Back to Program Evolution page