Once events become expected, they tend to become `subconscious'. There is an obvious analogy to the chunking algorithm: The chunker's attention is removed from events that become expected; they become `subconscious' (automatized) and give rise to even higher-level `abstractions' of the chunker's `consciousness'.
The chunking systems described in [Schmidhuber, 1991a], [Schmidhuber, 1991c] and the current paper try to detect temporal regularities and learn to use them for identifying relevant points in time. A general criticism of more conventional algorithms can be formulated as follows: These algorithms do not try to selectively focus on relevant inputs, they waste efficiency and resources by focussing on every input.
Speech is a good example of a domain involving multi-level temporal structure. Ongoing research will explore the application of chunking systems to speech recognition.
The principle of history compression is not limited to neural networks. Any adaptive sequence processing device could make use of it.