next up previous
Next: A SELF-ORGANIZING MULTI-LEVEL PREDICTOR Up: LEARNING UNAMBIGUOUS REDUCED SEQUENCE Previous: INTRODUCTION

THE PRINCIPLE OF HISTORY COMPRESSION

A major contribution of this work is an adaptive method for removing redundant information from sequences. This principle can be implemented with the help of any of the methods mentioned in the introduction.

Consider a deterministic discrete time predictor (not necessarily a neural network) whose state at time $t$ of sequence $p$ is described by an environmental input vector $x^p(t)$, an internal state vector $h^p(t)$, and an output vector $z^p(t)$. The environment may be non-deterministic. At time $0$, the predictor starts with $x^p(0)$ and an internal start state $h^p(0)$. At time $t \geq 0$, the predictor computes

\begin{displaymath}z^p(t)= f ( x^p(t), h^p(t)). \end{displaymath}

At time $t>0$, the predictor furthermore computes

\begin{displaymath}h^p(t) = g ( x^p(t-1), h^p(t-1)). \end{displaymath}

All information about the input at a given time $t_x$ can be reconstructed from the knowledge about $ t_x, f, g, x^p(0), h^p(0),$ and the pairs $(t_s, x^p(t_s))$ for which $0 < t_s \leq t_x$ and $z^p(t_s - 1) \neq x^p(t_s)$. This is because if $z^p(t)=x^p(t+1)$ at a given time $t$, then the predictor is able to predict the next input from the previous ones. The new input is derivable by means of $f$ and $g$.

Information about the observed input sequence can be even further compressed beyond just the unpredicted input vectors $x^p(t_s)$. It suffices to know only those elements of the vectors $x^p(t_s)$ that were not correctly predicted.

This observation implies that we can discriminate one sequence from another by knowing just the unpredicted inputs and the corresponding time steps at which they occurred. No information is lost if we ignore the expected inputs. We do not even have to know $f$ and $g$. I call this the principle of history compression.

From a theoretical point of view it is important to know at what time an unexpected input occurs; otherwise there will be a potential for ambiguities: Two different input sequences may lead to the same shorter sequence of unpredicted inputs. With many practical tasks, however, there is no need for knowing the critical time steps (see section 5).


next up previous
Next: A SELF-ORGANIZING MULTI-LEVEL PREDICTOR Up: LEARNING UNAMBIGUOUS REDUCED SEQUENCE Previous: INTRODUCTION
Juergen Schmidhuber 2003-02-25


Back to Recurrent Neural Networks page