Next: About this document ...
Up: DISCOVERING NEURAL NETS WITH
Previous: ACKNOWLEDGEMENTS
-
- Adleman, 1979
-
Adleman, L. (1979).
Time, space, and randomness.
Technical Report MIT/LCS/79/TM-131, Laboratory for Computer Science,
MIT.
- Allender, 1992
-
Allender, A. (1992).
Application of time-bounded Kolmogorov complexity in complexity
theory.
In Watanabe, O., editor, Kolmogorov complexity and
computational complexity, pages 6-22. EATCS Monographs on Theoretical
Computer Science, Springer.
- Amari and Murata, 1993
-
Amari, S. and Murata, N. (1993).
Statistical theory of learning curves under entropic loss criterion.
Neural Computation, 5(1):140-153.
- Atick et al., 1992
-
Atick, J. J., Li, Z., and Redlich, A. N. (1992).
Understanding retinal color coding from first principles.
Neural Computation, 4:559-572.
- Barlow, 1989
-
Barlow, H. B. (1989).
Unsupervised learning.
Neural Computation, 1(3):295-311.
- Barron, 1988
-
Barron, A. R. (1988).
Complexity regularization with application to artificial neural
networks.
In Nonparametric Functional Estimation and Related Topics,
pages 561-576. Kluwer Academic Publishers.
- Barto, 1989
-
Barto, A. G. (1989).
Connectionist approaches for control.
Technical Report COINS 89-89, University of Massachusetts, Amherst MA
01003.
- Barto et al., 1983
-
Barto, A. G., Sutton, R. S., and Anderson, C. W. (1983).
Neuronlike adaptive elements that can solve difficult learning
control problems.
IEEE Transactions on Systems, Man, and Cybernetics,
SMC-13:834-846.
- Barzdin, 1988
-
Barzdin, Y. M. (1988).
Algorithmic information theory.
In Reidel, D., editor, Encyclopaedia of Mathematics, volume 1,
pages 140-142. Kluwer Academic Publishers.
- Baum and Haussler, 1989
-
Baum, E. B. and Haussler, D. (1989).
What size net gives valid generalization?
Neural Computation, 1(1):151-160.
- Becker, 1991
-
Becker, S. (1991).
Unsupervised learning procedures for neural networks.
International Journal of Neural Systems, 2(1 & 2):17-33.
- Bennett, 1988
-
Bennett, C. H. (1988).
Logical depth and physical complexity.
In The Universal Turing Machine: A Half Century Survey,
volume 1, pages 227-258. Oxford University Press, Oxford and Kammerer &
Unverzagt, Hamburg.
- Blumer et al., 1987
-
Blumer, A., Ehrenfeucht, A., Haussler, D., and Warmuth, M. K. (1987).
Occam's razor.
Information Processing Letters, 24:377-380.
- Chaitin, 1966
-
Chaitin, G. (1966).
On the length of programs for computing finite binary sequences.
Journal of the ACM, 13:547-569.
- Chaitin, 1969
-
Chaitin, G. (1969).
On the length of programs for computing finite binary sequences:
statistical considerations.
Journal of the ACM, 16:145-159.
- Chaitin, 1975
-
Chaitin, G. (1975).
A theory of program size formally identical to information theory.
Journal of the ACM, 22:329-340.
- Chaitin, 1987
-
Chaitin, G. (1987).
Algorithmic Information Theory.
Cambridge University Press, Cambridge.
- Cover et al., 1989
-
Cover, T. M., Gács, P., and Gray, R. M. (1989).
Kolmogorov's contributions to information theory and algorithmic
complexity.
Annals of Probability Theory, 17:840-865.
- Dayan and Sejnowski, 1994
-
Dayan, P. and Sejnowski, T. (1994).
TD: Convergence with probability .
Machine Learning, 14:295-301.
- Deco et al., 1993
-
Deco, G., Finnoff, W., and Zimmermann, H. G. (1993).
Elimination of overtraining by a mutual information network.
In Proceedings of the International Conference on Artificial
Neural Networks, Amsterdam, pages 744-749. Springer.
- Dietterich, 1989
-
Dietterich, T. G. (1989).
Limitations of inductive learning.
In Proceedings of the Sixth International Workshop on Machine
Learning, Ithaca, NY, pages 124-128. San Francisco, CA: Morgan Kaufmann.
- Gács, 1974
-
Gács, P. (1974).
On the symmetry of algorithmic information.
Soviet Math. Dokl., 15:1477-1480.
- Gallant, 1990
-
Gallant, S. I. (1990).
A connectionist learning algorithm with provable generalization and
scaling bounds.
Neural Networks, 3:191-201.
- Gao and Li, 1989
-
Gao, Q. and Li, M. (1989).
The minimum description length principle and its application to
online learning of handprinted characters.
In Proc. 11th IEEE International Joint Conference on Artificial
Intelligence, Detroit, Mi, pages 843-848.
- Guyon et al., 1992
-
Guyon, I., Vapnik, V., Boser, B., Bottou, L., and Solla, S. A. (1992).
Structural risk minimization for character recognition.
In Lippman, D. S., Moody, J. E., and Touretzky, D. S., editors, Advances in Neural Information Processing Systems 4, pages 471-479. San
Mateo, CA: Morgan Kaufmann.
- Hartmanis, 1983
-
Hartmanis, J. (1983).
Generalized Kolmogorov complexity and the structure of feasible
computations.
In Proc. 24th IEEE Symposium on Foundations of Computer
Science, pages 439-445.
- Hassibi and Stork, 1993
-
Hassibi, B. and Stork, D. G. (1993).
Second order derivatives for network pruning: Optimal brain surgeon.
In Lippman, D. S., Moody, J. E., and Touretzky, D. S., editors, Advances in Neural Information Processing Systems 5, pages 164-171. San
Mateo, CA: Morgan Kaufmann.
- Haussler, 1988
-
Haussler, D. (1988).
Quantifying inductive bias: AI learning algorithms and Valiant's
learning framework.
Artificial Intelligence, 36:177-221.
- Heil, 1995
-
Heil, S. (1995).
Universelle Suche und inkrementelles Lernen, diploma thesis.
Fakultät für Informatik, Lehrstuhl Prof. Brauer, Technische
Universität München.
- Hinton and van Camp, 1993
-
Hinton, G. E. and van Camp, D. (1993).
Keeping neural networks simple.
In Proceedings of the International Conference on Artificial
Neural Networks, Amsterdam, pages 11-18. Springer.
- Hochreiter and Schmidhuber, 1995
-
Hochreiter, S. and Schmidhuber, J. (1995).
Simplifying neural nets by discovering flat minima.
In Tesauro, G., Touretzky, D. S., and Leen, T. K., editors, Advances in Neural Information Processing Systems 7, pages 529-536. MIT
Press, Cambridge MA.
- Hochreiter and Schmidhuber, 1996
-
Hochreiter, S. and Schmidhuber, J. (1996).
Flat minima.
Neural Computation.
In press. Extended version available in WWW homepages of Hochreiter
and Schmidhuber.
- Kolmogorov, 1965
-
Kolmogorov, A. (1965).
Three approaches to the quantitative definition of information.
Problems of Information Transmission, 1:1-11.
- Kolmogorov, 1933
-
Kolmogorov, A. N. (1933).
Grundbegriffe der Wahrscheinlichkeitsrechnung.
Springer, Berlin.
- Krogh and Hertz, 1992
-
Krogh, A. and Hertz, J. A. (1992).
A simple weight decay can improve generalization.
In Lippman, D. S., Moody, J. E., and Touretzky, D. S., editors, Advances in Neural Information Processing Systems 4, pages 950-957. San
Mateo, CA: Morgan Kaufmann.
- LeCun, 1985
-
LeCun, Y. (1985).
Une procédure d'apprentissage pour réseau à seuil
asymétrique.
Proceedings of Cognitiva 85, Paris, pages 599-604.
- LeCun et al., 1991
-
LeCun, Y., Kanter, I., and Solla, S. A. (1991).
Second order properties of error surfaces: Learning time and
generalization.
In Lippman, D. S., Moody, J. E., and Touretzky, D. S., editors, Advances in Neural Information Processing Systems 3, pages 918-924. San
Mateo, CA: Morgan Kaufmann.
- Levin, 1973a
-
Levin, L. A. (1973a).
On the notion of a random sequence.
Soviet Math. Dokl., 14(5):1413-1416.
- Levin, 1973b
-
Levin, L. A. (1973b).
Universal sequential search problems.
Problems of Information Transmission, 9(3):265-266.
- Levin, 1974
-
Levin, L. A. (1974).
Laws of information (nongrowth) and aspects of the foundation of
probability theory.
Problems of Information Transmission, 10(3):206-210.
- Levin, 1976
-
Levin, L. A. (1976).
Various measures of complexity for finite objects (axiomatic
description).
Soviet Math. Dokl., 17(2):522-526.
- Levin, 1984
-
Levin, L. A. (1984).
Randomness conservation inequalities: Information and independence in
mathematical theories.
Information and Control, 61:15-37.
- Li and Vitányi, 1989
-
Li, M. and Vitányi, P. M. B. (1989).
A theory of learning simple concepts under simple distributions and
average case complexity for the universal distribution.
In Proc. 30th American IEEE Symposium on Foundations of Computer
Science, pages 34-39.
- Li and Vitányi, 1993
-
Li, M. and Vitányi, P. M. B. (1993).
An Introduction to Kolmogorov Complexity and its
Applications.
Springer.
- Linsker, 1988
-
Linsker, R. (1988).
Self-organization in a perceptual network.
IEEE Computer, 21:105-117.
- Maass, 1994
-
Maass, W. (1994).
Perspectives of current research about the complexity of learning on
neural nets.
In Roychowdhury, V. P., Siu, K. Y., and Orlitsky, A., editors, Theoretical Advances in Neural Computation and Learning. Kluwer Academic
Publishers.
- MacKay, 1992
-
MacKay, D. J. C. (1992).
A practical Bayesian framework for backprop networks.
Neural Computation, 4:448-472.
- Martin-Löf, 1966
-
Martin-Löf, P. (1966).
The definition of random sequences.
Information and Control, 9:602-619.
- Milosavljevic and Jurka, 1993
-
Milosavljevic, A. and Jurka, J. (1993).
Discovery by minimal length encoding: A case study in molecular
evolution.
Machine Learning, 12:96-87.
- Moody, 1992
-
Moody, J. E. (1992).
The effective number of parameters: An analysis of generalization and
regularization in nonlinear learning systems.
In Lippman, D. S., Moody, J. E., and Touretzky, D. S., editors, Advances in Neural Information Processing Systems 4, pages 847-854. San
Mateo, CA: Morgan Kaufmann.
- Mozer and Smolensky, 1989
-
Mozer, M. C. and Smolensky, P. (1989).
Skeletonization: A technique for trimming the fat from a network via
relevance assessment.
In Touretzky, D. S., editor, Advances in Neural Information
Processing Systems 1, pages 107-115. San Mateo, CA: Morgan Kaufmann.
- Nowlan and Hinton, 1992
-
Nowlan, S. J. and Hinton, G. E. (1992).
Simplifying neural networks by soft weight sharing.
Neural Computation, 4:173-193.
- Parker, 1985
-
Parker, D. B. (1985).
Learning-logic.
Technical Report TR-47, Center for Comp. Research in Economics and
Management Sci., MIT.
- Paul and Solomonoff, 1991
-
Paul, W. and Solomonoff, R. J. (1991).
Autonomous theory building systems.
Manuscript, revised 1994.
- Pearlmutter and Rosenfeld, 1991
-
Pearlmutter, B. A. and Rosenfeld, R. (1991).
Chaitin-Kolmogorov complexity and generalization in neural
networks.
In Lippman, D. S., Moody, J. E., and Touretzky, D. S., editors, Advances in Neural Information Processing Systems 3, pages 925-931. San
Mateo, CA: Morgan Kaufmann.
- Pednault, 1989
-
Pednault, E. P. D. (1989).
Some experiments in applying inductive inference principles to
surface reconstruction.
In 11th IJCAI, pages 1603-1609. San Mateo, CA: Morgan
Kaufmann.
- Quinlan and Rivest, 1989
-
Quinlan, J. R. and Rivest, R. L. (1989).
Inferring decision trees using the minimum description length
principle.
Information and Computation, 80:227-248.
- Rissanen, 1978
-
Rissanen, J. (1978).
Modeling by shortest data description.
Automatica, 14:465-471.
- Rissanen, 1983
-
Rissanen, J. (1983).
A universal prior for integers and estimation by minimum description
length.
The Annals of Statistics, 11(2):416-431.
- Rissanen, 1986
-
Rissanen, J. (1986).
Stochastic complexity and modeling.
The Annals of Statistics, 14(3):1080-1100.
- Rumelhart et al., 1986
-
Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986).
Learning internal representations by error propagation.
In Parallel Distributed Processing, volume 1, pages 318-362.
MIT Press.
- Schaffer, 1993
-
Schaffer, C. (1993).
Overfitting avoidance as bias.
Machine Learning, 10:153-178.
- Schmidhuber, 1989
-
Schmidhuber, J. (1989).
The Neural Bucket Brigade: A local learning algorithm for dynamic
feedforward and recurrent networks.
Connection Science, 1(4):403-412.
- Schmidhuber, 1992a
-
Schmidhuber, J. (1992a).
Learning complex, extended sequences using the principle of history
compression.
Neural Computation, 4(2):234-242.
- Schmidhuber, 1992b
-
Schmidhuber, J. (1992b).
Learning factorial codes by predictability minimization.
Neural Computation, 4(6):863-879.
- Schmidhuber, 1993a
-
Schmidhuber, J. (1993a).
On decreasing the ratio between learning complexity and number of
time-varying variables in fully recurrent nets.
In Proceedings of the International Conference on Artificial
Neural Networks, Amsterdam, pages 460-463. Springer.
- Schmidhuber, 1993b
-
Schmidhuber, J. (1993b).
A self-referential weight matrix.
In Proceedings of the International Conference on Artificial
Neural Networks, Amsterdam, pages 446-451. Springer.
- Schmidhuber, 1994
-
Schmidhuber, J. (1994).
Discovering problem solutions with low Kolmogorov complexity and
high generalization capability.
Technical Report FKI-194-94, Fakultät für Informatik,
Technische Universität München.
Short version in A. Prieditis and S. Russell, eds., Machine Learning:
Proceedings of the Twelfth International Conference, Morgan Kaufmann
Publishers, pages 488-496, San Francisco, CA, 1995.
- Schmidhuber, 1995
-
Schmidhuber, J. (1995).
Low-complexity art.
Accepted by Leonardo, Journal of the International Society for
the Arts, Sciences, and Technology.
- Schmidhuber, 1996
-
Schmidhuber, J. (1996).
A general method for incremental self-improvement and multi-agent
learning in unrestricted environments.
In Yao, X., editor, Evolutionary Computation: Theory and
Applications. Scientific Publ. Co., Singapore.
In press.
- Schmidhuber et al., 1996
-
Schmidhuber, J., Zhao, J., and Wiering, M. (1996).
Simple principles of metalearning.
Technical Report IDSIA-69-96, IDSIA.
- Schnorr, 1971
-
Schnorr, C. P. (1971).
A unified approach to the definition of random sequences.
Mathematical Systems Theory, 5:246-258.
- Shannon, 1948
-
Shannon, C. E. (1948).
A mathematical theory of communication (parts I and II).
Bell System Technical Journal, XXVII:379-423.
- Solomonoff, 1964
-
Solomonoff, R. (1964).
A formal theory of inductive inference. Part I.
Information and Control, 7:1-22.
- Solomonoff, 1986
-
Solomonoff, R. (1986).
An application of algorithmic probability to problems in artificial
intelligence.
In Kanal, L. N. and Lemmer, J. F., editors, Uncertainty in
Artificial Intelligence, pages 473-491. Elsevier Science Publishers.
- Utgoff, 1986
-
Utgoff, P. (1986).
Shift of bias for inductive concept learning.
In Michalski, R., Carbonell, J., and Mitchell, T., editors, Machine Learning, volume 2, pages 163-190. Morgan Kaufmann, Los Altos, CA.
- Valiant, 1984
-
Valiant, L. G. (1984).
A theory of the learnable.
Communications of the ACM, 27:1134-1142.
- Vapnik, 1992
-
Vapnik, V. (1992).
Principles of risk minimization for learning theory.
In Lippman, D. S., Moody, J. E., and Touretzky, D. S., editors, Advances in Neural Information Processing Systems 4, pages 831-838. San
Mateo, CA: Morgan Kaufmann.
- Wallace and Boulton, 1968
-
Wallace, C. S. and Boulton, D. M. (1968).
An information theoretic measure for classification.
Computer Journal, 11(2):185-194.
- Watanabe, 1992
-
Watanabe, O. (1992).
Kolmogorov complexity and computational complexity.
EATCS Monographs on Theoretical Computer Science, Springer.
- Watkins, 1989
-
Watkins, C. (1989).
Learning from Delayed Rewards.
PhD thesis, King's College London.
- Weigend et al., 1990
-
Weigend, A. S., Huberman, B. A., and Rumelhart, D. E. (1990).
Predicting the future: A connectionist approach.
International Journal of Neural Systems, 1:193-209.
- Werbos, 1974
-
Werbos, P. J. (1974).
Beyond Regression: New Tools for Prediction and Analysis in the
Behavioral Sciences.
PhD thesis, Harvard University.
- Wiering and Schmidhuber, 1996
-
Wiering, M. and Schmidhuber, J. (1996).
Solving POMDPs with Levin search and EIRA.
In Saitta, L., editor, Machine Learning: Proceedings of the
Thirteenth International Conference, pages 534-542. Morgan Kaufmann
Publishers, San Francisco, CA.
- Williams, 1988
-
Williams, R. J. (1988).
Toward a theory of reinforcement-learning connectionist systems.
Technical Report NU-CCS-88-3, College of Comp. Sci., Northeastern
University, Boston, MA.
- Wolpert, 1993
-
Wolpert, D. H. (1993).
Technical Report SFI TR 93-03-016, Santa Fe Institute, NM 87501.
- Zhao and Schmidhuber, 1996
-
Zhao, J. and Schmidhuber, J. (1996).
Incremental self-improvement for life-time multi-agent reinforcement
learning.
In Maes, P., Mataric, M., Meyer, J.-A., Pollack, J., and Wilson,
S. W., editors, From Animals to Animats 4: Proceedings of the Fourth
International Conference on Simulation of Adaptive Behavior, Cambridge, MA,
pages 516-525. MIT Press, Bradford Books.
- Zvonkin and Levin, 1970
-
Zvonkin, A. K. and Levin, L. A. (1970).
The complexity of finite objects and the algorithmic concepts of
information and randomness.
Russian Math. Surveys, 25(6):83-124.
Juergen Schmidhuber
2003-02-12
Back to Optimal Universal Search page
Back to Program Evolution page
Back to Algorithmic Information page
Back to Speed Prior page