Searching without gradients

Next: Probabilistic target propagation Up: Remedies Previous: Ring's approach

Searching without gradients

The difficulty of learning long-term dependencies is strictly related to the continuous optimization approach that guides the search for a weight solution. One possibility for avoiding the problem is to resort to other kinds of search in weight space, in which the operators for generating another candidate weight solution are not based on continuous gradients. Bengio et al. [6] investigate methods such as simulated annealing, multi-grid random search, and discrete error propagation. Angeline et al. [1] (see also crossreference Chapter 15) propose a genetic approach that also avoids gradient computation. The simplest kind of search without gradient, however, simply randomly initializes all network weights until the resulting net happens to classify all training sequences correctly. In fact, as discussed in crossreference Chapter 9 of this book, simple weight guessing solves several popular benchmarks described in previous work faster than the recurrent net algorithms proposed therein (compare [13]). This does not mean that weight guessing is a good algorithm. It just means that the problems are very simple. More realistic tasks require either many free parameters (e.g., input weights) or high weight precision (e.g., for continuous-valued parameters), such that guessing becomes completely infeasible. Currently it is unclear to which extent more complex gradient-less methods can improve upon guessing in case of more realistic tasks. Bengio et al.'s approaches. Bengio et al. [6] investigate methods such as simulated annealing, multi-grid random search, time-weighted pseudo-Newton optimization, and discrete error propagation. Bengio and Frasconi [4] also propose an EM approach for propagating targets. With

so-called ``state networks'', at a given time, their system can be in one of only

different states. But to solve problems with continuous-valued inputs and outputs such systems would require an unacceptable number of states (i.e., state networks).

Next: Probabilistic target propagation Up: Remedies Previous: Ring's approach

Juergen Schmidhuber 2003-02-19