next up previous
Next: About this document ... Up: FLAT MINIMA NEURAL COMPUTATION Previous: Acknowledgments

Bibliography

Akaike, 1970
Akaike, H. (1970).
Statistical predictor identification.
Ann. Inst. Statist. Math., 22:203-217.

Amari and Murata, 1993
Amari, S. and Murata, N. (1993).
Statistical theory of learning curves under entropic loss criterion.
Neural Computation, 5(1):140-153.

Ash, 1989
Ash, T. (1989).
Dynamic node creation in backpropagation neural networks.
Connection Science, 1(4):365-375.

Craven and Wahba, 1979
Craven, P. and Wahba, G. (1979).
Smoothing noisy data with spline functions: Estimating the correct degree of smoothing by the method of generalized cross-validation.
Numer. Math., 31:377-403.

Eubank, 1988
Eubank, R. L. (1988).
Spline smoothing and nonparametric regression.
In Farlow, S., editor, Self-Organizing Methods in Modeling. Marcel Dekker, New York.

Fahlman and Lebiere, 1990
Fahlman, S. E. and Lebiere, C. (1990).
The cascade-correlation learning algorithm.
In Touretzky, D. S., editor, Advances in Neural Information Processing Systems 2, pages 525-532. Morgan Kaufmann.

Golub et al., 1979
Golub, G., Heath, H., and Wahba, G. (1979).
Generalized cross-validation as a method for choosing a good ridge parameter.
Technometrics, 21:215-224.

Guyon et al., 1992
Guyon, I., Vapnik, V., Boser, B., Bottou, L., and Solla, S. A. (1992).
Structural risk minimization for character recognition.
In Lippman, D. S., Moody, J. E., and Touretzky, D. S., editors, Advances in Neural Information Processing Systems 4, pages 471-479. Morgan Kaufmann.

Hassibi and Stork, 1993
Hassibi, B. and Stork, D. G. (1993).
Second order derivatives for network pruning: Optimal brain surgeon.
In Lippman, D. S., Moody, J. E., and Touretzky, D. S., editors, Advances in Neural Information Processing Systems 5, pages 164-171. Morgan Kaufmann.

Hastie and Tibshirani, 1990
Hastie, T. J. and Tibshirani, R. J. (1990).
Generalized additive models.
Monographs on Statisics and Applied Probability, 43.

Hinton and van Camp, 1993
Hinton, G. E. and van Camp, D. (1993).
Keeping neural networks simple.
In Proceedings of the International Conference on Artificial Neural Networks, Amsterdam, pages 11-18. Springer.

Hochreiter and Schmidhuber, 1994
Hochreiter, S. and Schmidhuber, J. (1994).
Flat minimum search finds simple nets.
Technical Report FKI-200-94, Fakultät für Informatik, Technische Universität München.

Hochreiter and Schmidhuber, 1995
Hochreiter, S. and Schmidhuber, J. (1995).
Simplifying neural nets by discovering flat minima.
In Tesauro, G., Touretzky, D. S., and Leen, T. K., editors, Advances in Neural Information Processing Systems 7, pages 529-536. MIT Press.

Holden, 1994
Holden, S. B. (1994).
On the Theory of Generalization and Self-Structuring in Linearly Weighted Connectionist Networks.
PhD thesis, Cambridge University, Engineering Department.

Krogh and Hertz, 1992
Krogh, A. and Hertz, J. A. (1992).
A simple weight decay can improve generalization.
In Lippman, D. S., Moody, J. E., and Touretzky, D. S., editors, Advances in Neural Information Processing Systems 4, pages 950-957. Morgan Kaufmann.

Kullback, 1959
Kullback, S. (1959).
Statistics and Information Theory.
J. Wiley and Sons, New York.

LeCun et al., 1990
LeCun, Y., Denker, J. S., and Solla, S. A. (1990).
Optimal brain damage.
In Touretzky, D. S., editor, Advances in Neural Information Processing Systems 2, pages 598-605. Morgan Kaufmann.

Levin et al., 1994
Levin, A. U., Leen, T. K., and Moody, J. E. (1994).
Fast pruning using principal components.
In Advances in Neural Information Processing Systems 6. Morgan Kaufmann.
To appear.

Levin et al., 1990
Levin, E., Tishby, N., and Solla, S. (1990).
A statistical approach to learning and generalization in layered neura l networks.
Proceedings of the IEEE, 78(10):1568-1574.

MacKay, 1992a
MacKay, D. J. C. (1992a).
Bayesian interpolation.
Neural Computation, 4:415-447.

MacKay, 1992b
MacKay, D. J. C. (1992b).
A practical Bayesian framework for backprop networks.
Neural Computation, 4:448-472.

M$\o$ller, 1993
M$\o$ller, M. F. (1993).
Exact calculation of the product of the Hessian matrix of feed-forward network error functions and a vector in O(N) time.
Technical Report PB-432, Computer Science Department, Aarhus University, Denmark.

Moody, 1989
Moody, J. E. (1989).
Fast learning in multi-resolution hierarchies.
In Touretzky, D. S., editor, Advances in Neural Information Processing Systems 1. Morgan Kaufmann.

Moody, 1992
Moody, J. E. (1992).
The effective number of parameters: An analysis of generalization and regularization in nonlinear learning systems.
In Lippman, D. S., Moody, J. E., and Touretzky, D. S., editors, Advances in Neural Information Processing Systems 4, pages 847-854. Morgan Kaufmann.

Mosteller and Tukey, 1968
Mosteller, F. and Tukey, J. W. (1968).
Data analysis, including statistics.
In Lindzey, G. and Aronson, E., editors, Handbook of Social Psychology, Vol. 2. Addison-Wesley.

Mozer and Smolensky, 1989
Mozer, M. C. and Smolensky, P. (1989).
Skeletonization: A technique for trimming the fat from a network via relevance assessment.
In Touretzky, D. S., editor, Advances in Neural Information Processing Systems 1, pages 107-115. Morgan Kaufmann.

Nowlan and Hinton, 1992
Nowlan, S. J. and Hinton, G. E. (1992).
Simplifying neural networks by soft weight sharing.
Neural Computation, 4:173-193.

Pearlmutter and Rosenfeld, 1991
Pearlmutter, B. A. and Rosenfeld, R. (1991).
Chaitin-Kolmogorov complexity and generalization in neural networks.
In Lippman, D. S., Moody, J. E., and Touretzky, D. S., editors, Advances in Neural Information Processing Systems 3, pages 925-931. Morgan Kaufmann.

Refenes et al., 1994
Refenes, A. N., Francis, G., and Zapranis, A. D. (1994).
Stock performance modeling using neural networks: A comparative study with regression models.
Neural Networks.

Rissanen, 1978
Rissanen, J. (1978).
Modeling by shortest data description.
Automatica, 14:465-471.

Schmidhuber, 1994a
Schmidhuber, J. (1994a).
Discovering problem solutions with low Kolmogorov complexity and high generalization capability.
Technical Report FKI-194-94, Fakultät für Informatik, Technische Universität München.
Short version in A. Prieditis and S. Russell, eds., Machine Learning: Proceedings of the Twelfth International Conference, Morgan Kaufmann Publishers, pages 488-496, San Francisco, CA, 1995.

Schmidhuber, 1994b
Schmidhuber, J. (1994b).
On learning how to learn learning strategies.
Technical Report FKI-198-94, Fakultät für Informatik, Technische Universität München.
Revised 1995.

Shannon, 1948
Shannon, C. E. (1948).
A mathematical theory of communication (parts I and II).
Bell System Technical Journal, XXVII:379-423.

Stone, 1974
Stone, M. (1974).
Cross-validatory choice and assessment of statistical predictions.
Roy. Stat. Soc., 36:111-147.

Vapnik, 1992
Vapnik, V. (1992).
Principles of risk minimization for learning theory.
In Lippman, D. S., Moody, J. E., and Touretzky, D. S., editors, Advances in Neural Information Processing Systems 4, pages 831-838. Morgan Kaufmann.

Wallace and Boulton, 1968
Wallace, C. S. and Boulton, D. M. (1968).
An information theoretic measure for classification.
Computer Journal, 11(2):185-194.

Wang et al., 1994
Wang, C., Venkatesh, S. S., and Judd, J. S. (1994).
Optimal stopping and effective machine complexity in learning.
In Advances in Neural Information Processing Systems 6. Morgan Kaufmann.
To appear.

White, 1989
White, H. (1989).
Learning in artificial neural networks: A statistical perspective.
Neural Computation, 1(4):425-464.

Williams, 1994
Williams, P. M. (1994).
Bayesian regularisation and pruning using a laplace prior.
Technical report, School of Cognitive and Computing Sciences, University of Sussex, Falmer, Brighton.

Wolpert, 1994a
Wolpert, D. H. (1994a).
Bayesian backpropagation over i-o functions rather than weights.
In Cowan, J. D., Tesauro, G., and Alspector, J., editors, Advances in Neural Information Processing Systems 6, pages 200-207. Morgan Kaufmann.

Wolpert, 1994b
Wolpert, D. H. (1994b).
The relationship between pac, the statistical physics framework, the bayesian framework, and the vc framework.
Technical Report SFI-TR-03-123, Santa Fe Institute, NM 87501.



Juergen Schmidhuber 2003-02-13


Back to Financial Forecasting page