next up previous
Next: ACKNOWLEDGEMENTS Up: LEARNING UNAMBIGUOUS REDUCED SEQUENCE Previous: EXPERIMENTS

CONTINUOUS HISTORY COMPRESSION

The history compression technique formulated above defines expectation-mismatches in a yes-or-no fashion: Each input unit whose activation is not predictable at a certain time gives rise to an unexpected event. Each unexpected event provokes an update of the internal state of a higher-level predictor. The updates always take place according to the conventional activation spreading rules for recurrent neural nets. There is no concept of a partial mismatch or of a `near-miss'. There is no possibility of updating the higher-level net `just a little bit' in response to a `nearly expected input'. In practical applications, some `epsilon' has to be used to define an acceptable mismatch.

In reply to the above criticism, continuous history compression is based on the following ideas:

We use local input representation. The components of $z^p(t)$ are forced to sum up to 1 and are interpreted as a prediction of the probability distribution of the possible $x^p(t+1)$: $z^p_j(t)$ is interpreted as the prediction of the probability that $x^p_j(t+1)$ is 1.

The output entropy

\begin{displaymath}- \sum_j z^p_j(t)log~z^p_j(t) \end{displaymath}

can be interpreted as a measure of the predictor's confidence. In the worst case, the predictor will expect every possible event with equal probability.

How much information is conveyed by $x^p(t+1)$ (relative to the current predictor), once it is observed? According to [23] it is

\begin{displaymath}-log~z^p_j(t).\end{displaymath}

[] defines update procedures that let highly informative events have a stronger influence on the history representation than less informative (more likely) events. The `strength' of an update in response to a more or less unexpected event is a monotonically increasing function of the information the event conveys. One of the methods uses Pollack's recursive auto-associative memories [13] for storing unexpected events, thus yielding an entirely local learning algorithm for learning extended sequences.


next up previous
Next: ACKNOWLEDGEMENTS Up: LEARNING UNAMBIGUOUS REDUCED SEQUENCE Previous: EXPERIMENTS
Juergen Schmidhuber 2003-02-25


Back to Recurrent Neural Networks page