For an agent living in a non-deterministic Markov environment (NME),
what is, in theory, the fastest way of acquiring information
about its statistical properties? The answer is: To design
``optimal'' sequences of ``experiments'' by performing action
sequences that maximize
expected information gain.
This notion is implemented by combining
concepts from information theory and reinforcement
learning. Experiments show that the resulting
method,
reinforcement driven
information acquisition,
can explore certain NMEs much faster than conventional random exploration.
Keywords:
Exploration,
reinforcement learning,
Q-learning,
information gain,
maximum likelihood models,
non-deterministic Markovian environments,
reinforcement directed information acquisition.