next up previous
Next: DS Advantage 5: Metalearning Up: Advantages of Direct Search Previous: DS Advantage 3: Straight-forward

DS Advantage 4: Non-Hierarchical Abstract Credit Assignment

Hierarchical learning of macros and reusable subprograms is of interest but limited. Often there are non-hierarchical (nevertheless exploitable) regularities in solution space. For instance, suppose we can obtain solution B by replacing every action "turn(right)" in solution A by "turn(left)." B will then be regular in the sense that it conveys little additional conditional algorithmic information, given A [SolomonoffSolomonoff1964,KolmogorovKolmogorov1965,ChaitinChaitin1969,Li VitányiLi Vitányi1993], that is, there is a short algorithm computing B from A. Hence B should not be hard to learn by a smart RL system that already found A. While DPRL cannot exploit such regularities in any obvious manner, DS in general algorithm spaces does not encounter any fundamental problems in this context. For instance, all that is necessary to find B may be a modification of the parameter ``right'' of a single instruction ``turn(right)" in a repetitive loop computing A [Schmidhuber, Zhao, WieringSchmidhuber et al.1997b].



Juergen Schmidhuber 2003-02-19


Back to Reinforcement Learning and POMDP page