Sun Yi, Faustino Gomez, Mark Ring, Jürgen Schmidhuber. Incremental
Basis Construction from Temporal Difference Error. Proceedings
of the 28th International Conference on Machine Learning (ICML-11),
2011.
Abstract
In many reinforcement learning (RL) systems, the value
function is approximated as a linear combination of a xed set of
basis functions. Performance can be improved by adding to this set.
Previous approaches construct a series of basis functions that in
suficient number can eventually represent the value function. In
contrast, we show that there is a single, ideal basis function,
which can directly represent the value function. Its addition to the
set immediately reduces the error to zero|without changing existing
weights. Moreover, this ideal basis function is simply the value
function that results from replacing the MDP's reward function with
its Bellman error. This result suggests a novel method for improving
value-function estimation: a primary reinforcement learner estimates
its value function using its present basis functions; it then sends
its TD error to a secondary learner, which interprets that error as
a reward function and estimates the corresponding value function;
the resulting value function then becomes the primary learner's new
basis function. We present both batch and online versions in
combination with incremental basis projection, and demonstrate that
the performance is superior to existing methods, especially in the
case of large discount factors.