Next: USING THE PREDICTOR FOR
Up: EXAMPLE 2: Text Compression
Previous: EXAMPLE 2: Text Compression
With the offline variant of the approach,
's training phase
is based on a set of training files.
Assume that the alphabet contains
possible characters
.
The (local) representation of is a binary -dimensional
vector with exactly one non-zero component (at the -th position).
has input units and output units.
is called the ``time-window size''.
We insert default characters at the beginning of each file.
The representation of the
default character, , is the -dimensional zero-vector.
The -th character of file (starting
from the first default character) is called .
For all and all possible ,
receives as an input
where is the concatenation operator for vectors.
produces as an output , a
-dimensional output vector.
Using back-propagation
[36][9][16][19],
is trained to minimize
Let denote the -th component of the vector .
Due to the local character representation,
this error function
is minimized if, for all and for all appropriate ,
is equal to the conditional probability
For normalization purposes, we define
Next: USING THE PREDICTOR FOR
Up: EXAMPLE 2: Text Compression
Previous: EXAMPLE 2: Text Compression
Juergen Schmidhuber
2003-02-19