Recurrent nets discover new motifs for protein classification Sepp Hochreiter Abstract: We apply recurrent nets to protein classification. We employ the Long Short-Term Memory (LSTM) recurrent net because LSTM is able to store the occurrence of certain amino acid patterns while scanning the sequence whereas other architectures cannot store patterns over extended periods. The LSTM architecture allows via its gating mechanism to detect parts in the amino acid sequence which have dependencies with the class label, that is, LSTM extracts motifs indicating the protein class. In comparison to traditional alignment methods on the PROSITE protein database, LSTM yields a lower misclassification rate and finds new motifs. If the LSTM extracted motifs are superimposed, then a motif is obtained, which is equal to the motif found by alignment methods. Thus, LSTM generalizes alignment methods by identifying dependencies within motifs and, additionally, is able to correlate motifs which are far apart from each other in the sequence.