Bibliography

Next: About this document ... Up: FLAT MINIMA NEURAL COMPUTATION Previous: Acknowledgments

Bibliography

Akaike, 1970: Akaike, H. (1970).
Statistical predictor identification.
Ann. Inst. Statist. Math., 22:203-217.
Amari and Murata, 1993: Amari, S. and Murata, N. (1993).
Statistical theory of learning curves under entropic loss criterion.
Neural Computation, 5(1):140-153.
Ash, 1989: Ash, T. (1989).
Dynamic node creation in backpropagation neural networks.
Connection Science, 1(4):365-375.
Craven and Wahba, 1979: Craven, P. and Wahba, G. (1979).
Smoothing noisy data with spline functions: Estimating the correct degree of smoothing by the method of generalized cross-validation.
Numer. Math., 31:377-403.
Eubank, 1988: Eubank, R. L. (1988).
Spline smoothing and nonparametric regression.
In Farlow, S., editor, Self-Organizing Methods in Modeling. Marcel Dekker, New York.
Fahlman and Lebiere, 1990: Fahlman, S. E. and Lebiere, C. (1990).
The cascade-correlation learning algorithm.
In Touretzky, D. S., editor, Advances in Neural Information Processing Systems 2, pages 525-532. Morgan Kaufmann.
Golub et al., 1979: Golub, G., Heath, H., and Wahba, G. (1979).
Generalized cross-validation as a method for choosing a good ridge parameter.
Technometrics, 21:215-224.
Guyon et al., 1992: Guyon, I., Vapnik, V., Boser, B., Bottou, L., and Solla, S. A. (1992).
Structural risk minimization for character recognition.
In Lippman, D. S., Moody, J. E., and Touretzky, D. S., editors, Advances in Neural Information Processing Systems 4, pages 471-479. Morgan Kaufmann.
Hassibi and Stork, 1993: Hassibi, B. and Stork, D. G. (1993).
Second order derivatives for network pruning: Optimal brain surgeon.
In Lippman, D. S., Moody, J. E., and Touretzky, D. S., editors, Advances in Neural Information Processing Systems 5, pages 164-171. Morgan Kaufmann.
Hastie and Tibshirani, 1990: Hastie, T. J. and Tibshirani, R. J. (1990).
Generalized additive models.
Monographs on Statisics and Applied Probability, 43.
Hinton and van Camp, 1993: Hinton, G. E. and van Camp, D. (1993).
Keeping neural networks simple.
In Proceedings of the International Conference on Artificial Neural Networks, Amsterdam, pages 11-18. Springer.
Hochreiter and Schmidhuber, 1994: Hochreiter, S. and Schmidhuber, J. (1994).
Flat minimum search finds simple nets.
Technical Report FKI-200-94, Fakultät für Informatik, Technische Universität München.
Hochreiter and Schmidhuber, 1995: Hochreiter, S. and Schmidhuber, J. (1995).
Simplifying neural nets by discovering flat minima.
In Tesauro, G., Touretzky, D. S., and Leen, T. K., editors, Advances in Neural Information Processing Systems 7, pages 529-536. MIT Press.
Holden, 1994: Holden, S. B. (1994).
On the Theory of Generalization and Self-Structuring in Linearly Weighted Connectionist Networks.
PhD thesis, Cambridge University, Engineering Department.
Krogh and Hertz, 1992: Krogh, A. and Hertz, J. A. (1992).
A simple weight decay can improve generalization.
In Lippman, D. S., Moody, J. E., and Touretzky, D. S., editors, Advances in Neural Information Processing Systems 4, pages 950-957. Morgan Kaufmann.
Kullback, 1959: Kullback, S. (1959).
Statistics and Information Theory.
J. Wiley and Sons, New York.
LeCun et al., 1990: LeCun, Y., Denker, J. S., and Solla, S. A. (1990).
Optimal brain damage.
In Touretzky, D. S., editor, Advances in Neural Information Processing Systems 2, pages 598-605. Morgan Kaufmann.
Levin et al., 1994: Levin, A. U., Leen, T. K., and Moody, J. E. (1994).
Fast pruning using principal components.
In Advances in Neural Information Processing Systems 6. Morgan Kaufmann.
To appear.
Levin et al., 1990: Levin, E., Tishby, N., and Solla, S. (1990).
A statistical approach to learning and generalization in layered neura l networks.
Proceedings of the IEEE, 78(10):1568-1574.
MacKay, 1992a: MacKay, D. J. C. (1992a).
Bayesian interpolation.
Neural Computation, 4:415-447.
MacKay, 1992b: MacKay, D. J. C. (1992b).
A practical Bayesian framework for backprop networks.
Neural Computation, 4:448-472.
M $\o$ ller, 1993: M $\o$ ller, M. F. (1993).
Exact calculation of the product of the Hessian matrix of feed-forward network error functions and a vector in O(N) time.
Technical Report PB-432, Computer Science Department, Aarhus University, Denmark.
Moody, 1989: Moody, J. E. (1989).
Fast learning in multi-resolution hierarchies.
In Touretzky, D. S., editor, Advances in Neural Information Processing Systems 1. Morgan Kaufmann.
Moody, 1992: Moody, J. E. (1992).
The effective number of parameters: An analysis of generalization and regularization in nonlinear learning systems.
In Lippman, D. S., Moody, J. E., and Touretzky, D. S., editors, Advances in Neural Information Processing Systems 4, pages 847-854. Morgan Kaufmann.
Mosteller and Tukey, 1968: Mosteller, F. and Tukey, J. W. (1968).
Data analysis, including statistics.
In Lindzey, G. and Aronson, E., editors, Handbook of Social Psychology, Vol. 2. Addison-Wesley.
Mozer and Smolensky, 1989: Mozer, M. C. and Smolensky, P. (1989).
Skeletonization: A technique for trimming the fat from a network via relevance assessment.
In Touretzky, D. S., editor, Advances in Neural Information Processing Systems 1, pages 107-115. Morgan Kaufmann.
Nowlan and Hinton, 1992: Nowlan, S. J. and Hinton, G. E. (1992).
Simplifying neural networks by soft weight sharing.
Neural Computation, 4:173-193.
Pearlmutter and Rosenfeld, 1991: Pearlmutter, B. A. and Rosenfeld, R. (1991).
Chaitin-Kolmogorov complexity and generalization in neural networks.
In Lippman, D. S., Moody, J. E., and Touretzky, D. S., editors, Advances in Neural Information Processing Systems 3, pages 925-931. Morgan Kaufmann.
Refenes et al., 1994: Refenes, A. N., Francis, G., and Zapranis, A. D. (1994).
Stock performance modeling using neural networks: A comparative study with regression models.
Neural Networks.
Rissanen, 1978: Rissanen, J. (1978).
Modeling by shortest data description.
Automatica, 14:465-471.
Schmidhuber, 1994a: Schmidhuber, J. (1994a).
Discovering problem solutions with low Kolmogorov complexity and high generalization capability.
Technical Report FKI-194-94, Fakultät für Informatik, Technische Universität München.
Short version in A. Prieditis and S. Russell, eds., Machine Learning: Proceedings of the Twelfth International Conference, Morgan Kaufmann Publishers, pages 488-496, San Francisco, CA, 1995.
Schmidhuber, 1994b: Schmidhuber, J. (1994b).
On learning how to learn learning strategies.
Technical Report FKI-198-94, Fakultät für Informatik, Technische Universität München.
Revised 1995.
Shannon, 1948: Shannon, C. E. (1948).
A mathematical theory of communication (parts I and II).
Bell System Technical Journal, XXVII:379-423.
Stone, 1974: Stone, M. (1974).
Cross-validatory choice and assessment of statistical predictions.
Roy. Stat. Soc., 36:111-147.
Vapnik, 1992: Vapnik, V. (1992).
Principles of risk minimization for learning theory.
In Lippman, D. S., Moody, J. E., and Touretzky, D. S., editors, Advances in Neural Information Processing Systems 4, pages 831-838. Morgan Kaufmann.
Wallace and Boulton, 1968: Wallace, C. S. and Boulton, D. M. (1968).
An information theoretic measure for classification.
Computer Journal, 11(2):185-194.
Wang et al., 1994: Wang, C., Venkatesh, S. S., and Judd, J. S. (1994).
Optimal stopping and effective machine complexity in learning.
In Advances in Neural Information Processing Systems 6. Morgan Kaufmann.
To appear.
White, 1989: White, H. (1989).
Learning in artificial neural networks: A statistical perspective.
Neural Computation, 1(4):425-464.
Williams, 1994: Williams, P. M. (1994).
Bayesian regularisation and pruning using a laplace prior.
Technical report, School of Cognitive and Computing Sciences, University of Sussex, Falmer, Brighton.
Wolpert, 1994a: Wolpert, D. H. (1994a).
Bayesian backpropagation over i-o functions rather than weights.
In Cowan, J. D., Tesauro, G., and Alspector, J., editors, Advances in Neural Information Processing Systems 6, pages 200-207. Morgan Kaufmann.
Wolpert, 1994b: Wolpert, D. H. (1994b).
The relationship between pac, the statistical physics framework, the bayesian framework, and the vc framework.
Technical Report SFI-TR-03-123, Santa Fe Institute, NM 87501.

Juergen Schmidhuber 2003-02-13

Back to Financial Forecasting page