Several approaches to on-line supervised sequence learning have been proposed, including back-propagation through time or BPTT, e.g. [Williams and Peng, 1990], the IID- or RTRL-algorithm [Robinson and Fallside, 1987][Williams and Zipser, 1989], and the recent fast-weight algorithm [Schmidhuber, 1991b]). These approaches are computationally intensive; BPTT is not local in time, RTRL-like algorithms and also their more efficient recent relatives [Schmidhuber, 1991d] are not local in space [Schmidhuber, 1991c]. Common to all of these approaches is that they do not try to selectively focus on relevant inputs; they waste efficiency and resources by focussing on every input. With many applications, a second drawback of these methods is the following: The longer the time lag between an event and the occurrence of a corresponding error the less information is carried by the corresponding back-propagated error signals. [Mozer, 1990] and [Rohwer, 1989] have addressed the latter problem but not the former.
How can a system learn to focus on the relevant points in time? What does it mean for a point in time to be relevant? How can the system learn to reduce the numbers of inputs to be considered over time without losing information? A major contribution of this work is an adaptive method for removing redundant information from sequences. The next section shows that the system ought to focus on unexpected inputs and ignore expected ones.