REINFORCEMENT DRIVEN INFORMATION ACQUISITION IN NON-DETERMINISTIC ENVIRONMENTS

For an agent living in a non-deterministic Markov environment (NME), what is, in theory, the fastest way of acquiring information about its statistical properties? The answer is: To design ``optimal'' sequences of ``experiments'' by performing action sequences that maximize expected information gain. This notion is implemented by combining concepts from information theory and reinforcement learning. Experiments show that the resulting method, reinforcement driven information acquisition, can explore certain NMEs much faster than conventional random exploration.

Keywords: Exploration, reinforcement learning, Q-learning, information gain, maximum likelihood models, non-deterministic Markovian environments, reinforcement directed information acquisition.

REINFORCEMENT DRIVEN INFORMATION ACQUISITION IN NON-DETERMINISTIC ENVIRONMENTS

Abstract: