Next: USING THE PREDICTOR FOR Up: EXAMPLE 2: Text Compression Previous: EXAMPLE 2: Text Compression

## PREDICTING CONDITIONAL PROBABILITIES

With the offline variant of the approach, 's training phase is based on a set of training files. Assume that the alphabet contains possible characters . The (local) representation of is a binary -dimensional vector with exactly one non-zero component (at the -th position). has input units and output units. is called the time-window size''. We insert default characters at the beginning of each file. The representation of the default character, , is the -dimensional zero-vector. The -th character of file (starting from the first default character) is called .

For all and all possible , receives as an input

where is the concatenation operator for vectors. produces as an output , a -dimensional output vector. Using back-propagation [36][9][16][19], is trained to minimize

Let denote the -th component of the vector . Due to the local character representation, this error function is minimized if, for all and for all appropriate , is equal to the conditional probability

For normalization purposes, we define

Next: USING THE PREDICTOR FOR Up: EXAMPLE 2: Text Compression Previous: EXAMPLE 2: Text Compression
Juergen Schmidhuber 2003-02-19