Next: DS Advantage 5: Metalearning
Up: Advantages of Direct Search
Previous: DS Advantage 3: Straight-forward
Hierarchical learning of macros and reusable subprograms is of interest but
limited. Often there are non-hierarchical (nevertheless exploitable)
regularities in solution space. For instance, suppose we can obtain
solution B by replacing every action "turn(right)" in solution A by
"turn(left)." B will then be regular in the sense that it conveys
little additional conditional algorithmic information, given A
[SolomonoffSolomonoff1964,KolmogorovKolmogorov1965,ChaitinChaitin1969,Li VitányiLi Vitányi1993], that is,
there is a short algorithm computing B from A. Hence B
should not be hard to learn by a smart RL system that already found A.
While DPRL cannot exploit such regularities in any obvious manner,
DS in general algorithm spaces does not encounter
any fundamental problems in this context. For instance,
all that is necessary
to find B may be a modification of the parameter ``right'' of a single
instruction ``turn(right)" in a repetitive loop
computing A [Schmidhuber, Zhao, WieringSchmidhuber
et al.1997b].
Juergen Schmidhuber
2003-02-19
Back to Reinforcement Learning and POMDP page