The history compression technique formulated above defines expectation-mismatches in a yes-or-no fashion: Each input unit whose activation is not predictable at a certain time gives rise to an unexpected event. Each unexpected event provokes an update of the internal state of a higher-level predictor. The updates always take place according to the conventional activation spreading rules for recurrent neural nets. There is no concept of a partial mismatch or of a `near-miss'. There is no possibility of updating the higher-level net `just a little bit' in response to a `nearly expected input'. In practical applications, some `epsilon' has to be used to define an acceptable mismatch.
In reply to the above criticism, continuous history compression is based on the following ideas:
We use local input representation. The components of are forced to sum up to 1 and are interpreted as a prediction of the probability distribution of the possible : is interpreted as the prediction of the probability that is 1.
The output entropy
How much information is conveyed by (relative to
the current predictor), once it is observed?
According to  it is