Next: HOW TO USE THE Up: PREDICTIVE CODING WITH NEURAL Previous: INTRODUCTION

A PREDICTOR OF CONDITIONAL PROBABILITIES

Assume that the alphabet contains possible characters $z_1, z_2, \ldots, z_k$ . The (local) representation of is a binary -dimensional vector with exactly one non-zero component (at the -th position). has input units and output units. is called the ``time-window size''. We insert default characters at the beginning of each file. The representation of the default character, , is the -dimensional zero-vector. The -th character of file (starting from the first default character) is called .

For all $f \in F$ and all possible , receives as an input

$\begin{displaymath} r(c^f_{m-n}) \circ r(c^f_{m-n+1}) \circ \ldots \circ r(c^f_{m-1}) , \end{displaymath}$

(1)

where $\circ$ is the concatenation operator for vectors.

produces as an output

, a

-dimensional output vector. Using back-propagation [6][2][3][4],

is trained to minimize

$\begin{displaymath} \frac{1}{2} \sum_{f \in F} \sum_{m > n} \mid \mid r(c^f_{m}) - P^f_m \mid \mid ^2. \end{displaymath}$

(2)

Expression (2) is minimal if always equals

$\begin{displaymath} E( r(c^f_{m}) \mid c^f_{m-n}, \ldots, c^f_{m-1}), \end{displaymath}$

(3)

the conditional expectation of $r(c^f_{m})$ , given $r(c^f_{m-n}) \circ r(c^f_{m-n+1}) \circ \ldots \circ r(c^f_{m-1})$ . Due to the local character representation, this is equivalent to

being equal to the conditional probability

$\begin{displaymath} Pr(c^f_m = z_i \mid c^f_{m-n}, \ldots, c^f_{m-1}) \end{displaymath}$

(4)

for all

and for all appropriate

, where

denotes the

-th component of the vector

In general, the will not quite match the corresponding conditional probabilities. For normalization purposes, we define

$\begin{displaymath} P^f_m(i) = \frac {(P^f_m)_i}{\sum_{j=1}^k (P^f_m)_j }. \end{displaymath}$

(5)

No normalization is used during training, however.

Next: HOW TO USE THE Up: PREDICTIVE CODING WITH NEURAL Previous: INTRODUCTION

Juergen Schmidhuber 2003-02-25