Next: CONCLUSION Up: CURIOUS MODEL-BUILDING CONTROL SYSTEMS Previous: PREDICTING ERROR CHANGES DIRECTLY

EXPERIMENTS

A `curious' adaptive agent based on Watkins' Q-learning method was tested in artificial non-deterministic discrete-state environments.

(the world model),

, a controller

, and a module for evaluating pairs of environmental states and actions

were implemented as general back-propagation networks.

The agent was able to move around in a two-dimensional world with 100 different states. The environment was reactive. 's task was to predict the reactions of the environment which were partly random and partly deterministic.

The `curious' system was tested against the conventional random search method. With both methods, at time the sum of the squared differences between the values of the possible deterministic reactions and the corresponding predictions of was used as a criterion for judging the quality of .

With guidance by the principle of adaptive curiosity decreased up to 10 times faster than with random search (see [6] for details). The reason for this superior performance was that the `curious' system soon found out that there were certain states of the environment where further performance improvement of could be expected. It started to focus on these particular states. The random search method was not selective at all, therefore it wasted a lot of time on senseless exploration of states of the environment that did not allow performance improvement.

The more complex the environment the more benefits should be expected from the principle of adaptive curiosity. Ongoing experiments focus on increasingly complex worlds, non-local input/output representations and on the expected `generalization capabilities' of non-trivial networks with hidden units.

Next: CONCLUSION Up: CURIOUS MODEL-BUILDING CONTROL SYSTEMS Previous: PREDICTING ERROR CHANGES DIRECTLY

Juergen Schmidhuber 2003-02-28

Back to Active Learning - Exploration - Curiosity page
Back to Reinforcement Learning page