Next: E. COMPARISON OF METHODS
Up: D. METHOD 3
Previous: D. METHOD 3
Like with METHODs 1 and 2, the ``time-window'' corresponding to the
predictor input is sequentially shifted across the unknown text file.
The , however, are used in a different way.
The character whose representation has minimal Euclidean
distance to is taken
as the predictor's deterministic prediction (if there is more
than one character with minimal distance to the output, then we
take the one with lowest ASCII value).
If
does not match the prediction,
then it is stored in a second
file, together with a number indicating how many characters were
processed since the last non-matching character
[6][7].
Expected characters are simply ignored - they represent redundant
information.
To avoid confusions between unexpected numbers
from the original file and
numbers indicating how many correct predictions went
by since the last wrong prediction, we introduce an escape character
to mark unexpected number characters in the second file.
The escape character is used
to mark unexpected escape characters, too.
Finally we apply Huffman-Coding
(as embodied by the UNIX function pack)
to the second file and obtain the final compressed file.
The ``uncompress'' algorithm works
as follows: we first unpack the compressed file by inverse
Huffman-Coding (as employed by the UNIX function unpack).
Then, starting from default characters,
the predictor sequentially tries to predict
each character of the original file from the previous characters
(deterministic predictions are obtained like with the
compression procedure above.)
The numbers in the unpacked file contain all information
about which predictions are wrong, and the associated characters
tell us how to correct wrong predictions:
if the unpacked file indicates that
the current prediction is correct, it is
fed back to the predictor input and becomes part of the
basis for the next prediction.
If the unpacked file indicates that
the current prediction is wrong, the corresponding
entry in the unpacked file (the correct character associated
with the number indicating how many correct predictions went
by since the last unexpected character)
replaces the prediction and
is fed back to the predictor input where it becomes part of the
basis for the next prediction.
Next: E. COMPARISON OF METHODS
Up: D. METHOD 3
Previous: D. METHOD 3
Juergen Schmidhuber
2003-02-19