next up previous

Reinforcement Learning in Markovian and Non-Markovian Environments

Jürgen Schmidhuber, TUM

In D. S. Lippman, J. E. Moody, and D. S. Touretzky, editors, Advances in Neural Information Processing Systems 3, NIPS'3, pages 500-506. San Mateo, CA: Morgan Kaufmann, 1991.


This work addresses three problems with reinforcement learning and adaptive neuro-control: 1. Non-Markovian interfaces between learner and environment. 2. On-line learning based on system realization. 3. Vector-valued adaptive critics. An algorithm is described which is based on system realization and on two interacting fully recurrent continually running networks which may learn in parallel. Problems with parallel learning are attacked by `adaptive randomness'. It is also described how interacting model/controller systems can be combined with vector-valued `adaptive critics' (previous critics have been scalar).

Juergen Schmidhuber 2003-02-25

Back to Reinforcement Learning POMDP page