The history compression technique formulated above defines expectation-mismatches in a yes-or-no fashion: Each input unit whose activation is not predictable at a certain time gives rise to an unexpected event. Each unexpected event provokes an update of the internal state of a higher-level predictor. The updates always take place according to the conventional activation spreading rules for recurrent neural nets. There is no concept of a partial mismatch or of a `near-miss'. There is no possibility of updating the higher-level net `just a little bit' in response to a `nearly expected input'. In practical applications, some `epsilon' has to be used to define an acceptable mismatch.
In reply to the above criticism, continuous history compression is based on the following ideas:
We use local input representation. The components of are forced to sum up to 1 and are interpreted as a prediction of the probability distribution of the possible : is interpreted as the prediction of the probability that is 1.
The output entropy
How much information is conveyed by (relative to
the current predictor), once it is observed?
According to [23] it is