Next: About this document ...
Up: FLAT MINIMA NEURAL COMPUTATION
Previous: Acknowledgments
- Akaike, 1970
-
Akaike, H. (1970).
Statistical predictor identification.
Ann. Inst. Statist. Math., 22:203-217.
- Amari and Murata, 1993
-
Amari, S. and Murata, N. (1993).
Statistical theory of learning curves under entropic loss criterion.
Neural Computation, 5(1):140-153.
- Ash, 1989
-
Ash, T. (1989).
Dynamic node creation in backpropagation neural networks.
Connection Science, 1(4):365-375.
- Craven and Wahba, 1979
-
Craven, P. and Wahba, G. (1979).
Smoothing noisy data with spline functions: Estimating the correct
degree of smoothing by the method of generalized cross-validation.
Numer. Math., 31:377-403.
- Eubank, 1988
-
Eubank, R. L. (1988).
Spline smoothing and nonparametric regression.
In Farlow, S., editor, Self-Organizing Methods in Modeling.
Marcel Dekker, New York.
- Fahlman and Lebiere, 1990
-
Fahlman, S. E. and Lebiere, C. (1990).
The cascade-correlation learning algorithm.
In Touretzky, D. S., editor, Advances in Neural Information
Processing Systems 2, pages 525-532. Morgan Kaufmann.
- Golub et al., 1979
-
Golub, G., Heath, H., and Wahba, G. (1979).
Generalized cross-validation as a method for choosing a good ridge
parameter.
Technometrics, 21:215-224.
- Guyon et al., 1992
-
Guyon, I., Vapnik, V., Boser, B., Bottou, L., and Solla, S. A. (1992).
Structural risk minimization for character recognition.
In Lippman, D. S., Moody, J. E., and Touretzky, D. S., editors, Advances in Neural Information Processing Systems 4, pages 471-479. Morgan
Kaufmann.
- Hassibi and Stork, 1993
-
Hassibi, B. and Stork, D. G. (1993).
Second order derivatives for network pruning: Optimal brain surgeon.
In Lippman, D. S., Moody, J. E., and Touretzky, D. S., editors, Advances in Neural Information Processing Systems 5, pages 164-171. Morgan
Kaufmann.
- Hastie and Tibshirani, 1990
-
Hastie, T. J. and Tibshirani, R. J. (1990).
Generalized additive models.
Monographs on Statisics and Applied Probability, 43.
- Hinton and van Camp, 1993
-
Hinton, G. E. and van Camp, D. (1993).
Keeping neural networks simple.
In Proceedings of the International Conference on Artificial
Neural Networks, Amsterdam, pages 11-18. Springer.
- Hochreiter and Schmidhuber, 1994
-
Hochreiter, S. and Schmidhuber, J. (1994).
Flat minimum search finds simple nets.
Technical Report FKI-200-94, Fakultät für Informatik,
Technische Universität München.
- Hochreiter and Schmidhuber, 1995
-
Hochreiter, S. and Schmidhuber, J. (1995).
Simplifying neural nets by discovering flat minima.
In Tesauro, G., Touretzky, D. S., and Leen, T. K., editors, Advances in Neural Information Processing Systems 7, pages 529-536. MIT
Press.
- Holden, 1994
-
Holden, S. B. (1994).
On the Theory of Generalization and Self-Structuring in Linearly
Weighted Connectionist Networks.
PhD thesis, Cambridge University, Engineering Department.
- Krogh and Hertz, 1992
-
Krogh, A. and Hertz, J. A. (1992).
A simple weight decay can improve generalization.
In Lippman, D. S., Moody, J. E., and Touretzky, D. S., editors, Advances in Neural Information Processing Systems 4, pages 950-957. Morgan
Kaufmann.
- Kullback, 1959
-
Kullback, S. (1959).
Statistics and Information Theory.
J. Wiley and Sons, New York.
- LeCun et al., 1990
-
LeCun, Y., Denker, J. S., and Solla, S. A. (1990).
Optimal brain damage.
In Touretzky, D. S., editor, Advances in Neural Information
Processing Systems 2, pages 598-605. Morgan Kaufmann.
- Levin et al., 1994
-
Levin, A. U., Leen, T. K., and Moody, J. E. (1994).
Fast pruning using principal components.
In Advances in Neural Information Processing Systems 6. Morgan
Kaufmann.
To appear.
- Levin et al., 1990
-
Levin, E., Tishby, N., and Solla, S. (1990).
A statistical approach to learning and generalization in layered
neura l networks.
Proceedings of the IEEE, 78(10):1568-1574.
- MacKay, 1992a
-
MacKay, D. J. C. (1992a).
Bayesian interpolation.
Neural Computation, 4:415-447.
- MacKay, 1992b
-
MacKay, D. J. C. (1992b).
A practical Bayesian framework for backprop networks.
Neural Computation, 4:448-472.
- Mller, 1993
-
Mller, M. F. (1993).
Exact calculation of the product of the Hessian matrix of
feed-forward network error functions and a vector in O(N) time.
Technical Report PB-432, Computer Science Department, Aarhus
University, Denmark.
- Moody, 1989
-
Moody, J. E. (1989).
Fast learning in multi-resolution hierarchies.
In Touretzky, D. S., editor, Advances in Neural Information
Processing Systems 1. Morgan Kaufmann.
- Moody, 1992
-
Moody, J. E. (1992).
The effective number of parameters: An analysis of generalization and
regularization in nonlinear learning systems.
In Lippman, D. S., Moody, J. E., and Touretzky, D. S., editors, Advances in Neural Information Processing Systems 4, pages 847-854. Morgan
Kaufmann.
- Mosteller and Tukey, 1968
-
Mosteller, F. and Tukey, J. W. (1968).
Data analysis, including statistics.
In Lindzey, G. and Aronson, E., editors, Handbook of Social
Psychology, Vol. 2. Addison-Wesley.
- Mozer and Smolensky, 1989
-
Mozer, M. C. and Smolensky, P. (1989).
Skeletonization: A technique for trimming the fat from a network via
relevance assessment.
In Touretzky, D. S., editor, Advances in Neural Information
Processing Systems 1, pages 107-115. Morgan Kaufmann.
- Nowlan and Hinton, 1992
-
Nowlan, S. J. and Hinton, G. E. (1992).
Simplifying neural networks by soft weight sharing.
Neural Computation, 4:173-193.
- Pearlmutter and Rosenfeld, 1991
-
Pearlmutter, B. A. and Rosenfeld, R. (1991).
Chaitin-Kolmogorov complexity and generalization in neural
networks.
In Lippman, D. S., Moody, J. E., and Touretzky, D. S., editors, Advances in Neural Information Processing Systems 3, pages 925-931. Morgan
Kaufmann.
- Refenes et al., 1994
-
Refenes, A. N., Francis, G., and Zapranis, A. D. (1994).
Stock performance modeling using neural networks: A comparative study
with regression models.
Neural Networks.
- Rissanen, 1978
-
Rissanen, J. (1978).
Modeling by shortest data description.
Automatica, 14:465-471.
- Schmidhuber, 1994a
-
Schmidhuber, J. (1994a).
Discovering problem solutions with low Kolmogorov complexity and
high generalization capability.
Technical Report FKI-194-94, Fakultät für Informatik,
Technische Universität München.
Short version in A. Prieditis and S. Russell, eds., Machine Learning:
Proceedings of the Twelfth International Conference, Morgan Kaufmann
Publishers, pages 488-496, San Francisco, CA, 1995.
- Schmidhuber, 1994b
-
Schmidhuber, J. (1994b).
On learning how to learn learning strategies.
Technical Report FKI-198-94, Fakultät für Informatik,
Technische Universität München.
Revised 1995.
- Shannon, 1948
-
Shannon, C. E. (1948).
A mathematical theory of communication (parts I and II).
Bell System Technical Journal, XXVII:379-423.
- Stone, 1974
-
Stone, M. (1974).
Cross-validatory choice and assessment of statistical predictions.
Roy. Stat. Soc., 36:111-147.
- Vapnik, 1992
-
Vapnik, V. (1992).
Principles of risk minimization for learning theory.
In Lippman, D. S., Moody, J. E., and Touretzky, D. S., editors, Advances in Neural Information Processing Systems 4, pages 831-838. Morgan
Kaufmann.
- Wallace and Boulton, 1968
-
Wallace, C. S. and Boulton, D. M. (1968).
An information theoretic measure for classification.
Computer Journal, 11(2):185-194.
- Wang et al., 1994
-
Wang, C., Venkatesh, S. S., and Judd, J. S. (1994).
Optimal stopping and effective machine complexity in learning.
In Advances in Neural Information Processing Systems 6. Morgan
Kaufmann.
To appear.
- White, 1989
-
White, H. (1989).
Learning in artificial neural networks: A statistical perspective.
Neural Computation, 1(4):425-464.
- Williams, 1994
-
Williams, P. M. (1994).
Bayesian regularisation and pruning using a laplace prior.
Technical report, School of Cognitive and Computing Sciences,
University of Sussex, Falmer, Brighton.
- Wolpert, 1994a
-
Wolpert, D. H. (1994a).
Bayesian backpropagation over i-o functions rather than weights.
In Cowan, J. D., Tesauro, G., and Alspector, J., editors, Advances in Neural Information Processing Systems 6, pages 200-207. Morgan
Kaufmann.
- Wolpert, 1994b
-
Wolpert, D. H. (1994b).
The relationship between pac, the statistical physics framework, the
bayesian framework, and the vc framework.
Technical Report SFI-TR-03-123, Santa Fe Institute, NM 87501.
Juergen Schmidhuber
2003-02-13
Back to Financial Forecasting page