Next: EXPERIMENTS
Up: CURIOUS MODEL-BUILDING CONTROL SYSTEMS
Previous: A CURIOUS SYSTEM BASED
The reinforcement generating mechanism
for the reinforcement learning systems described above can be modified in
various ways. For instance,
define as 's response to after having
adjusted at time .
We can replace the confidence network by
a network which at every time step receives the current input
and whose target output
is the current change of 's output
caused by 's learning algorithm ( should have a small
learning rate).
will learn
approximations of the expectations
of the changes of 's responses to
given inputs. The absolute value
of 's output
(an approximation
of
)
should be taken as the reinforcement for the adaptive critic or
the Q-learning algorithm (the reinforcement
learning algorithm does not have to be specified here):
The control system's
curiosity goal
at time is to maximize
where
is a discount rate.
An alternative would be to make predictions about the
(discounted) sum of future
changes of 's weight vector and use these predictions in an
analoguous manner.
Next: EXPERIMENTS
Up: CURIOUS MODEL-BUILDING CONTROL SYSTEMS
Previous: A CURIOUS SYSTEM BASED
Juergen Schmidhuber
2003-02-28
Back to Active Learning - Exploration - Curiosity page
Back to Reinforcement Learning page