next up previous
Next: DS Advantage 2: No Up: Advantages of Direct Search Previous: Advantages of Direct Search

DS Advantage 1: No States

Finite time convergence proofs for DPRL [Kearns SinghKearns Singh1999] require (among other things) that the environment can be quantized into a finite number of discrete states, and that the topology describing possible transitions from one state to the next, given a particular action, is known in advance. Even if the real world was quantizable into a discrete state space, however, for all practical purposes this space will be inaccessible and remain unknown. Current proofs do not cover apparently minor deviations from the basic principle, such as the world-class RL backgammon player [TesauroTesauro1994], which uses a nonlinear function approximator to deal with a large but finite number of discrete states and, for the moment at least, seems a bit like a miracle without full theoretical foundation. Prior knowledge about the topology of a network connecting discrete states is also required by algorithms for partially observable Markov decisicion processes (POMDPs), although they are more powerful than standard DPRL, e.g., [Kaelbling, Littman, CassandraKaelbling et al.1995,Littman, Cassandra, KaelblingLittman et al.1995]. In general, however, we do not know a priori how to quantize a given environment into meaningful states.

DS, however, completely avoids the issues of value functions and state identification -- it just cares for testing policies and keeping those that work best.


next up previous
Next: DS Advantage 2: No Up: Advantages of Direct Search Previous: Advantages of Direct Search
Juergen Schmidhuber 2003-02-19


Back to Reinforcement Learning and POMDP page