Because a look-up table would be extremely inefficient. A look-up table requires entries for all the conditional probabilities corresponding to all possible combinations of previous characters and possible next characters. In addition, a special procedure is required for dealing with previously unseen combinations of input characters. In contrast, the size of a neural net typically grows in proportion to (assuming the number of hidden units grows in proportion to the number of input units), and its inherent ``generalization capability'' is going to take care of previously unseen combinations of input characters (hopefully by coming up with good predicted probabilities).

Juergen Schmidhuber 2003-02-25