** Next:** DS Advantage 2: No
** Up:** Advantages of Direct Search
** Previous:** Advantages of Direct Search

Finite time convergence proofs for DPRL [Kearns SinghKearns Singh1999] require
(among other things) that the environment can be quantized into a
finite number of discrete states, and that the topology describing
possible transitions from one state to the next, given a particular
action, is known in advance. Even if the real world was quantizable
into a discrete state space, however, for all practical purposes this
space will be inaccessible and remain unknown. Current proofs do not
cover apparently minor deviations from the basic principle, such as
the world-class RL backgammon player [TesauroTesauro1994], which uses
a nonlinear function approximator to deal with a large but finite
number of discrete states and, for the moment at least, seems a bit like
a miracle without full theoretical foundation. Prior knowledge about the
topology of a network connecting discrete states is also required by
algorithms for partially observable
Markov decisicion processes (POMDPs), although they are more powerful
than standard DPRL, e.g., [Kaelbling, Littman, CassandraKaelbling
et al.1995,Littman, Cassandra, KaelblingLittman
et al.1995].
In general, however, we do not know *a priori* how to quantize a given
environment into meaningful states.
DS, however, completely avoids the issues of
value functions and state identification
-- it just cares for testing policies and keeping
those that work best.

** Next:** DS Advantage 2: No
** Up:** Advantages of Direct Search
** Previous:** Advantages of Direct Search
Juergen Schmidhuber
2003-02-19

Back to Reinforcement Learning and POMDP page