   Next: AN EXPERIMENT Up: COLLAPSING THE HIERARCHY INTO Previous: OUTLINE

## DETAILS OF THE 2-NET CHUNKING ARCHITECTURE

The system described below is the on-line version of a representative of a number of variations of the basic principle described in 4.1. See [Schmidhuber, 1991c] for various modifications.

Table 1 gives an overview of various time-dependent activation vectors relevant for the description of the algorithm. Additional notation: ' is the concatenation operator; if the teacher provides a target vector at time and otherwise. If then takes on some default value, e.g. the zero vector.

------------INSERT TABLE 1 HERE--------------

Table 1: Definitions of symbols representing time-dependent activation vectors. ' is the concatenation operator. and are based on previous inputs and are computed without knowledge about and .
 vector description (referring to time ) dimension normal' environmental input  teacher-defined target  A's input  A's hidden activations  A's prediction of   A's prediction of   unique representation of   C's hidden activations  C's prediction of 's next target input  C's prediction of 's next normal' input  C's prediction of 's next time' input     A's prediction of     A has input units, hidden units, and output units (see table 1). With pure prediction tasks . C has hidden units, and output units. All of A's input and hidden units have directed connections to all of A's hidden and output units. All input units of A have directed connections to all hidden and output units of C. This is because A's input units serve as input units for C at certain time steps. There are additional input units for C for providing unique representations of the current time step. These additional input units also have directed connections to all hidden and output units of C. All hidden units of C have directed connections to all hidden and output units of C.

A will try to make equal to if , and it will try to make equal to , thus trying to predict . Here again the target prediction problem is defined as a special case of an input prediction problem. C will try to make equal to the externally provided teaching vector if and if A failed to emit . Furthermore, it will always try to make equal to the next non-teaching input to be processed by C. This input may be many time steps ahead. Finally, and most importantly, A will try to make equal to , thus trying to predict the state of C. The activations of C's output units are considered as part of its state.

Both C and A simultaneously are trained by a conventional algorithm for recurrent networks in an on-line fashion. Both the IID-Algorithm and BPTT are appropriate. In particular, computationally inexpensive variants of BPTT [Williams and Peng, 1990] are interesting: There are tasks with hierarchical temporal structure where only a few iterations of back-propagation back into time' per time step are in principle sufficient to bridge arbitrary time lags (see section 5).

I now describe the (quite familiar) procedure for updating activations in a net.

Repeat for a constant number of iterations (typically one or two): I now specify the input-output behavior of the chunker and the automatizer as well as the details of error injection:    Next: AN EXPERIMENT Up: COLLAPSING THE HIERARCHY INTO Previous: OUTLINE
Juergen Schmidhuber 2003-02-13

Back to Independent Component Analysis page.

Back to Recurrent Neural Networks page