Next: HOW TO USE THE Up: PREDICTIVE CODING WITH NEURAL Previous: INTRODUCTION

# A PREDICTOR OF CONDITIONAL PROBABILITIES

Assume that the alphabet contains possible characters . The (local) representation of is a binary -dimensional vector with exactly one non-zero component (at the -th position). has input units and output units. is called the time-window size''. We insert default characters at the beginning of each file. The representation of the default character, , is the -dimensional zero-vector. The -th character of file (starting from the first default character) is called .

For all and all possible , receives as an input

 (1)

where is the concatenation operator for vectors. produces as an output , a -dimensional output vector. Using back-propagation [6][2][3][4], is trained to minimize

 (2)

Expression (2) is minimal if always equals

 (3)

the conditional expectation of , given . Due to the local character representation, this is equivalent to being equal to the conditional probability
 (4)

for all and for all appropriate , where denotes the -th component of the vector .

In general, the will not quite match the corresponding conditional probabilities. For normalization purposes, we define

 (5)

No normalization is used during training, however.

Next: HOW TO USE THE Up: PREDICTIVE CODING WITH NEURAL Previous: INTRODUCTION
Juergen Schmidhuber 2003-02-25