Assume that the alphabet contains possible characters . The (local) representation of is a binary -dimensional vector with exactly one non-zero component (at the -th position). has input units and output units. is called the ``time-window size''. We insert default characters at the beginning of each file. The representation of the default character, , is the -dimensional zero-vector. The -th character of file (starting from the first default character) is called .

For all and all possible ,
receives as an input

(1) |

(2) |

Expression (2) is minimal if
always equals

(3) |

(4) |

In general, the
will not quite match the corresponding
conditional probabilities.
For normalization purposes, we define

(5) |