Very Deep Learning with Highway Networks

How can we train deep neural networks with very long credit assignment paths from inputs to outputs to solve complex AI problems? Many solutions to this problem have been proposed over the years, but most do not work well for very deep networks with arbitrary non-linear (and possibly recurrent) transformations between layers.

This project presents a different take on the problem. We simply design neural networks in a way that makes them easier to optimize even for very large depths.

Our Highway Networks take inspiration from Long Short Term Memory (LSTM) and allow training of deep, efficient networks (even with hundreds of layers) with conventional gradient-based methods. Even when large depths are not required, highway layers can be used instead of traditional neural layers to allow the network to adaptively copy or transform representations.

Papers

Training Very Deep Networks
R. K. Srivastava, K. Greff and J. Schmidhuber
Neural Information Processing Systems (NIPS 2015 Spotlight) arXiv:1507.06228
Download logs for all 800 optimization runs here, with instructions.

Highway Networks
R. K. Srivastava, K. Greff and J. Schmidhuber
Deep Learning Workshop (ICML 2015). arXiv:1505.00387 poster

Code

Frequently Asked Questions

Q: How do I set the bias for the transform gates when initializing a highway network?

A: You can think of the initial bias as a prior over the behavior of your network at initialization. In general, this is a hyper-parameter which will depend on the given problem and network architecture. However, here are some general suggestions which have worked for certain problems:

Q: Is the highway gating mechanism related to how information flow is regulated in the brain?

A: Information processing in the brain is not understood very well yet. However, the idea that the brain uses similar gating mechanisms has definitely been considered seriously by neuroscientists.

For example, see: Gisiger, T., & Boukadoum, M. (2011). Mechanisms Gating the Flow of Information in the Cortex: What They Might Look Like and What Their Uses may be. Frontiers in Computational Neuroscience, 5, 1. http://doi.org/10.3389/fncom.2011.00001 link

Some Publications which use Highway Networks

  1. Kim, Yoon, et al. "Character-Aware Neural Language Models." arXiv preprint arXiv:1508.06615 (2015).
  2. Zhang et al. "Highway Long Short-Term Memory RNNs for Distant Speech Recognition." arXiv preprint arXiV:1510.08983 (2015).