next up previous


We implemented both alternative variants of the encoding and decoding procedure described above.

Our current computing environment prohibits extensive experimental evaluations of the method. The predictor updates turn out to be quite time consuming, which makes special neural net hardware recommendable. The limited software simulations presented in this section, however, will show that the ``neural'' compression technique can achieve ``excellent'' compression ratios. Here the term ``excellent'' is defined by a statement from [1]:

"In general, good algorithms can be expected to achieve an average compression ratio of 1.5, while excellent algorithms based upon sophisticated processing techniques will achieve an average compression ratio exceeding 2.0."
Here the average compression ratio is the average ratio between the lengths of original and compressed files.

The method was applied to German newspaper articles. The results were compared to those obtained with standard encoding techniques provided by the operating system UNIX, namely ``pack'', ``compress'', and ``gzip''. The corresponding decoding algorithms are ``unpack'', ``uncompress'', and ``gunzip'', respectively. ``pack'' is based on Huffman-Coding (e.g. [1]), while ``compress'' and ``gzip'' are based on techniques developed by Lempel and Ziv (e.g. [9]). As the file size goes to infinity, Lempel-Ziv becomes asymptotically optimal in a certain information theoretic sense [8]. This does not necessarily mean, however, that Lempel-Ziv is optimal for finite file sizes.

The training set for the predictor was given by a set of 40 articles from the newspaper Münchner Merkur, each containing between 10000 and 20000 characters. The alphabet consisted of $k=80$ possible characters, including upper case and lower case letters, digits, interpunction symbols, and special German letters like ``ö'', ``ü'', ``ä''. $P$ had 430 hidden units. A ``true'' unit with constant activation 1.0 was connected to all hidden and output units. The learning rate was 0.2. The training phase consisted of 25 sweeps through the training set.

The test set consisted of newspaper articles excluded from the training set, each containing between 10000 and 20000 characters. Table 1 lists the average compression ratios. The ``neural'' method outperformed the strongest conventional competitor, the UNIX ``gzip'' function based on a Lempel-Ziv algorithm.

Table: Compression ratios of various compression algorithms for short German text files ($<20000$ Bytes) from the unknown test set.
Method Compression Ratio  
Huffman Coding (UNIX: pack) 1.74  
Lempel-Ziv Coding (UNIX: compress) 1.99  
Improved Lempel-Ziv ( UNIX: gzip -9) 2.29  
Neural predictor + Huffman Coding, $n=5$ 2.70  
Neural predictor + Arithmetic Coding, $n=5$ 2.72  

How does a neural net trained on articles from Münchner Merkur perform on articles from other sources? Without retraining the neural predictor, we applied all competing methods to 10 articles from another German newspaper (the Frankenpost). The results are given in table 2.

Table: Compression ratios for articles from a different newspaper. The neural predictor was not retrained.
Method Compression Ratio  
Huffman Coding (UNIX: pack) 1.67  
Lempel-Ziv Coding (UNIX: compress) 1.71  
Improved Lempel-Ziv ( UNIX: gzip -9) 2.03  
Neural predictor + Huffman Coding, $n=5$ 2.25  
Neural predictor + Arithmetic Coding, $n=5$ 2.20  

The Frankenpost articles were harder to compress for all algorithms. But relative performance remained comparable.

Note that the time-window was quite small ($n=5$). In general, larger time windows will make more information available to the predictor. In turn, this will improve the prediction quality and increase the compression ratio. Therefore we expect to obtain even better results for $n>5$ and for recurrent predictor networks.

next up previous
Juergen Schmidhuber 2003-02-25