next up previous

In ``Learning to learn'', pages 293-309, Kluwer, 1997 (postscript)

Jürgen Schmidhuber, Jieyu Zhao, Nicol N. Schraudolph
IDSIA, Lugano, Switzerland

Click here for illustrative talk slides!


A learner's modifiable components are called its policy. An algorithm that modifies the policy is a learning algorithm. If the learning algorithm has modifiable components represented as part of the policy, then we speak of a self-modifying policy (SMP). SMPs can modify the way they modify themselves etc. They are of interest in situations where the initial learning algorithm itself can be improved by experience -- this is what we call ``learning to learn''. How can we force some (stochastic) SMP to trigger better and better self-modifications? The success-story algorithm (SSA) addresses this question in a lifelong reinforcement learning context. During the learner's life-time, SSA is occasionally called at times computed according to SMP itself. SSA uses backtracking to undo those SMP-generated SMP-modifications that have not been empirically observed to trigger lifelong reward accelerations (measured up until the current SSA call -- this evaluates the long-term effects of SMP-modifications setting the stage for later SMP-modifications). SMP-modifications that survive SSA represent a lifelong success history. Until the next SSA call, they build the basis for additional SMP-modifications. Solely by self-modifications our SMP/SSA-based learners solve a complex task in a partially observable environment (POE) whose state space is far bigger than most reported in the POE literature.

next up previous
Juergen Schmidhuber

Back to Metalearning page
Back to Reinforcement Learning page