next up previous
Next: About this document ... Up: COMPRESSING TEXTS WITH NEURAL Previous: VI. ACKNOWLEDGMENT


T. C. Bell, J. G. Cleary, and I. H. Witten,
Text Compression,
Prentice Hall, Englewood Cliffs, NJ, 1990.

J. Ziv and A. Lempel,
A universal algorithm for sequential data compression,
IEEE Transactions on Information Theory, IT-23(5):337-343 (1977).

A. Wyner and J. Ziv,
Fixed data base version of the Lempel-Ziv data compression algorithm,
IEEE Transactions Information Theory, 37:878-880 (1991).

G. Held,
Data Compression,
Wiley and Sons LTD, New York, 1991.

I. H. Witten, R. M. Neal, and J. G. Cleary,
Arithmetic coding for data compression,
Communications of the ACM, 30(6):520-540 (1987).

J. H. Schmidhuber,
Learning unambiguous reduced sequence descriptions,
in J. E. Moody, S. J. Hanson, and R. P. Lippman, editors, Advances in Neural Information Processing Systems 4, San Mateo, CA: Morgan Kaufmann, 1992, pp. 291-298.

J. H. Schmidhuber,
Learning complex, extended sequences using the principle of history compression,
Neural Computation, 4(2):234-242 (1992).

P. J. Werbos,
Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences,
PhD thesis, Harvard University, 1974.

D. E. Rumelhart, G. E. Hinton, and R. J. Williams,
Learning internal representations by error propagation,
Parallel Distributed Processing, volume 1, MIT Press, 1986, pp. 318-362.

J. H. Schmidhuber, M. C. Mozer, and D. Prelinger,
Continuous history compression,
in H. Hüning, S. Neuhauser, M. Raus, and W. Ritschel, editors, Proc. of Intl. Workshop on Neural Networks, RWTH Aachen, Augustinus, 1993, pp. 87-95.

J. H. Schmidhuber,
Learning factorial codes by predictability minimization,
Neural Computation, 4(6):863-879 (1992).

J. H. Schmidhuber and D. Prelinger,
Discovering predictable classifications,
Neural Computation, 5(4):625-635 (1993).

S. Lindstädt,
Comparison of two unsupervised neural network models for redundancy reduction,
In M. C. Mozer, P. Smolensky, D. S. Touretzky, J. L. Elman, and A. S. Weigend, editors, Proc. of the 1993 Connectionist Models Summer School, Hillsdale, NJ: Erlbaum Associates, 1993, pp. 308-315.

Table 1: Average compression ratios (and corresponding variances) of various compression algorithms tested on short German text files ($<20000$ Bytes) from the unknown test set from Münchner Merkur.
Method Av. compression ratio Variance
Huffman Coding (UNIX: pack) 1.74 0.0002
Lempel-Ziv Coding (UNIX: compress) 1.99 0.0014
METHOD 3, $n=5$ 2.20 0.0014
Improved Lempel-Ziv ( UNIX: gzip -9) 2.29 0.0033
METHOD 1, $n=5$ 2.70 0.0158
METHOD 2, $n=5$ 2.72 0.0234

Table 2: Average compression ratios and variances for the Frankenpost. The neural predictor was not retrained.
Method Av. compression ratio Variance
Huffman Coding (UNIX: pack) 1.67 0.0003
Lempel-Ziv Coding (UNIX: compress) 1.71 0.0036
METHOD 3, $n=5$ 1.99 0.0013
Improved Lempel-Ziv ( UNIX: gzip -9) 2.03 0.0099
METHOD 1, $n=5$ 2.25 0.0077
METHOD 2, $n=5$ 2.20 0.0112

next up previous
Next: About this document ... Up: COMPRESSING TEXTS WITH NEURAL Previous: VI. ACKNOWLEDGMENT
Juergen Schmidhuber 2003-02-19