Solving POMDPs with Levin Search and EIRA

Partially observable Markov decision problems (POMDPs) recently received a lot of attention in the reinforcement learning community. No attention, however, has been paid to Levin's universal search through program space (LS), which is theoretically optimal for a wide variety of search problems including many POMDPs. Experiments in this paper first show that LS can solve partially observable mazes (POMs) involving many more states and obstacles than those solved by various previous authors (here, LS also can easily outperform Q-learning). We then note, however, that LS is not necessarily optimal for ``incremental'' learning problems where experience with previous problems may help to reduce future search costs. For this reason, we introduce an adaptive extension of LS (ALS) which uses experience to increase probabilities of instructions occurring in successful programs found by LS. To deal with cases where ALS does not lead to long term performance improvement, we use the recent technique of ``environment-independent reinforcement acceleration'' (EIRA) as a safety belt (EIRA currently is the only known method that guarantees a lifelong history of reward accelerations). Experiments with additional POMs demonstrate: (a) ALS can dramatically reduce the search time consumed by successive calls of LS. (b) Additional significant speed-ups can be obtained by combining ALS and EIRA.

Solving POMDPs with Levin Search and EIRA

Abstract: