The history compression technique formulated above defines expectation-mismatches in a yes-or-no fashion: Each input unit whose activation is not predictable at a certain time gives rise to an unexpected event. Each unexpected event provokes an update of the internal state of a higher-level predictor. The updates always take place according to the conventional activation spreading rules for recurrent neural nets. There is no concept of a partial mismatch or of a `near-miss'. There is no possibility of updating the higher-level net `just a little bit' in response to a `nearly expected input'. In practical applications, some `epsilon' has to be used to define an acceptable mismatch.

In reply to the above criticism, *continuous history compression*
is based on the following ideas:

We use local input representation. The components of are forced to sum up to 1 and are interpreted as a prediction of the probability distribution of the possible : is interpreted as the prediction of the probability that is 1.

The output entropy

can be interpreted as a measure of the predictor's confidence. In the worst case, the predictor will expect every possible event with equal probability.

How much information is conveyed by (relative to
the current predictor), once it is observed?
According to [23] it is

[] defines update procedures that let highly informative events have a stronger influence on the history representation than less informative (more likely) events. The `strength' of an update in response to a more or less unexpected event is a monotonically increasing function of the information the event conveys. One of the methods uses Pollack's recursive auto-associative memories [13] for storing unexpected events, thus yielding an entirely local learning algorithm for learning extended sequences.

Back to Recurrent Neural Networks page