next up previous
Next: About this document ... Up: DISCOVERING NEURAL NETS WITH Previous: ACKNOWLEDGEMENTS

Bibliography

Adleman, 1979
Adleman, L. (1979).
Time, space, and randomness.
Technical Report MIT/LCS/79/TM-131, Laboratory for Computer Science, MIT.

Allender, 1992
Allender, A. (1992).
Application of time-bounded Kolmogorov complexity in complexity theory.
In Watanabe, O., editor, Kolmogorov complexity and computational complexity, pages 6-22. EATCS Monographs on Theoretical Computer Science, Springer.

Amari and Murata, 1993
Amari, S. and Murata, N. (1993).
Statistical theory of learning curves under entropic loss criterion.
Neural Computation, 5(1):140-153.

Atick et al., 1992
Atick, J. J., Li, Z., and Redlich, A. N. (1992).
Understanding retinal color coding from first principles.
Neural Computation, 4:559-572.

Barlow, 1989
Barlow, H. B. (1989).
Unsupervised learning.
Neural Computation, 1(3):295-311.

Barron, 1988
Barron, A. R. (1988).
Complexity regularization with application to artificial neural networks.
In Nonparametric Functional Estimation and Related Topics, pages 561-576. Kluwer Academic Publishers.

Barto, 1989
Barto, A. G. (1989).
Connectionist approaches for control.
Technical Report COINS 89-89, University of Massachusetts, Amherst MA 01003.

Barto et al., 1983
Barto, A. G., Sutton, R. S., and Anderson, C. W. (1983).
Neuronlike adaptive elements that can solve difficult learning control problems.
IEEE Transactions on Systems, Man, and Cybernetics, SMC-13:834-846.

Barzdin, 1988
Barzdin, Y. M. (1988).
Algorithmic information theory.
In Reidel, D., editor, Encyclopaedia of Mathematics, volume 1, pages 140-142. Kluwer Academic Publishers.

Baum and Haussler, 1989
Baum, E. B. and Haussler, D. (1989).
What size net gives valid generalization?
Neural Computation, 1(1):151-160.

Becker, 1991
Becker, S. (1991).
Unsupervised learning procedures for neural networks.
International Journal of Neural Systems, 2(1 & 2):17-33.

Bennett, 1988
Bennett, C. H. (1988).
Logical depth and physical complexity.
In The Universal Turing Machine: A Half Century Survey, volume 1, pages 227-258. Oxford University Press, Oxford and Kammerer & Unverzagt, Hamburg.

Blumer et al., 1987
Blumer, A., Ehrenfeucht, A., Haussler, D., and Warmuth, M. K. (1987).
Occam's razor.
Information Processing Letters, 24:377-380.

Chaitin, 1966
Chaitin, G. (1966).
On the length of programs for computing finite binary sequences.
Journal of the ACM, 13:547-569.

Chaitin, 1969
Chaitin, G. (1969).
On the length of programs for computing finite binary sequences: statistical considerations.
Journal of the ACM, 16:145-159.

Chaitin, 1975
Chaitin, G. (1975).
A theory of program size formally identical to information theory.
Journal of the ACM, 22:329-340.

Chaitin, 1987
Chaitin, G. (1987).
Algorithmic Information Theory.
Cambridge University Press, Cambridge.

Cover et al., 1989
Cover, T. M., Gács, P., and Gray, R. M. (1989).
Kolmogorov's contributions to information theory and algorithmic complexity.
Annals of Probability Theory, 17:840-865.

Dayan and Sejnowski, 1994
Dayan, P. and Sejnowski, T. (1994).
TD$(\lambda)$: Convergence with probability $1$.
Machine Learning, 14:295-301.

Deco et al., 1993
Deco, G., Finnoff, W., and Zimmermann, H. G. (1993).
Elimination of overtraining by a mutual information network.
In Proceedings of the International Conference on Artificial Neural Networks, Amsterdam, pages 744-749. Springer.

Dietterich, 1989
Dietterich, T. G. (1989).
Limitations of inductive learning.
In Proceedings of the Sixth International Workshop on Machine Learning, Ithaca, NY, pages 124-128. San Francisco, CA: Morgan Kaufmann.

Gács, 1974
Gács, P. (1974).
On the symmetry of algorithmic information.
Soviet Math. Dokl., 15:1477-1480.

Gallant, 1990
Gallant, S. I. (1990).
A connectionist learning algorithm with provable generalization and scaling bounds.
Neural Networks, 3:191-201.

Gao and Li, 1989
Gao, Q. and Li, M. (1989).
The minimum description length principle and its application to online learning of handprinted characters.
In Proc. 11th IEEE International Joint Conference on Artificial Intelligence, Detroit, Mi, pages 843-848.

Guyon et al., 1992
Guyon, I., Vapnik, V., Boser, B., Bottou, L., and Solla, S. A. (1992).
Structural risk minimization for character recognition.
In Lippman, D. S., Moody, J. E., and Touretzky, D. S., editors, Advances in Neural Information Processing Systems 4, pages 471-479. San Mateo, CA: Morgan Kaufmann.

Hartmanis, 1983
Hartmanis, J. (1983).
Generalized Kolmogorov complexity and the structure of feasible computations.
In Proc. 24th IEEE Symposium on Foundations of Computer Science, pages 439-445.

Hassibi and Stork, 1993
Hassibi, B. and Stork, D. G. (1993).
Second order derivatives for network pruning: Optimal brain surgeon.
In Lippman, D. S., Moody, J. E., and Touretzky, D. S., editors, Advances in Neural Information Processing Systems 5, pages 164-171. San Mateo, CA: Morgan Kaufmann.

Haussler, 1988
Haussler, D. (1988).
Quantifying inductive bias: AI learning algorithms and Valiant's learning framework.
Artificial Intelligence, 36:177-221.

Heil, 1995
Heil, S. (1995).
Universelle Suche und inkrementelles Lernen, diploma thesis.
Fakultät für Informatik, Lehrstuhl Prof. Brauer, Technische Universität München.

Hinton and van Camp, 1993
Hinton, G. E. and van Camp, D. (1993).
Keeping neural networks simple.
In Proceedings of the International Conference on Artificial Neural Networks, Amsterdam, pages 11-18. Springer.

Hochreiter and Schmidhuber, 1995
Hochreiter, S. and Schmidhuber, J. (1995).
Simplifying neural nets by discovering flat minima.
In Tesauro, G., Touretzky, D. S., and Leen, T. K., editors, Advances in Neural Information Processing Systems 7, pages 529-536. MIT Press, Cambridge MA.

Hochreiter and Schmidhuber, 1996
Hochreiter, S. and Schmidhuber, J. (1996).
Flat minima.
Neural Computation.
In press. Extended version available in WWW homepages of Hochreiter and Schmidhuber.

Kolmogorov, 1965
Kolmogorov, A. (1965).
Three approaches to the quantitative definition of information.
Problems of Information Transmission, 1:1-11.

Kolmogorov, 1933
Kolmogorov, A. N. (1933).
Grundbegriffe der Wahrscheinlichkeitsrechnung.
Springer, Berlin.

Krogh and Hertz, 1992
Krogh, A. and Hertz, J. A. (1992).
A simple weight decay can improve generalization.
In Lippman, D. S., Moody, J. E., and Touretzky, D. S., editors, Advances in Neural Information Processing Systems 4, pages 950-957. San Mateo, CA: Morgan Kaufmann.

LeCun, 1985
LeCun, Y. (1985).
Une procédure d'apprentissage pour réseau à seuil asymétrique.
Proceedings of Cognitiva 85, Paris, pages 599-604.

LeCun et al., 1991
LeCun, Y., Kanter, I., and Solla, S. A. (1991).
Second order properties of error surfaces: Learning time and generalization.
In Lippman, D. S., Moody, J. E., and Touretzky, D. S., editors, Advances in Neural Information Processing Systems 3, pages 918-924. San Mateo, CA: Morgan Kaufmann.

Levin, 1973a
Levin, L. A. (1973a).
On the notion of a random sequence.
Soviet Math. Dokl., 14(5):1413-1416.

Levin, 1973b
Levin, L. A. (1973b).
Universal sequential search problems.
Problems of Information Transmission, 9(3):265-266.

Levin, 1974
Levin, L. A. (1974).
Laws of information (nongrowth) and aspects of the foundation of probability theory.
Problems of Information Transmission, 10(3):206-210.

Levin, 1976
Levin, L. A. (1976).
Various measures of complexity for finite objects (axiomatic description).
Soviet Math. Dokl., 17(2):522-526.

Levin, 1984
Levin, L. A. (1984).
Randomness conservation inequalities: Information and independence in mathematical theories.
Information and Control, 61:15-37.

Li and Vitányi, 1989
Li, M. and Vitányi, P. M. B. (1989).
A theory of learning simple concepts under simple distributions and average case complexity for the universal distribution.
In Proc. 30th American IEEE Symposium on Foundations of Computer Science, pages 34-39.

Li and Vitányi, 1993
Li, M. and Vitányi, P. M. B. (1993).
An Introduction to Kolmogorov Complexity and its Applications.
Springer.

Linsker, 1988
Linsker, R. (1988).
Self-organization in a perceptual network.
IEEE Computer, 21:105-117.

Maass, 1994
Maass, W. (1994).
Perspectives of current research about the complexity of learning on neural nets.
In Roychowdhury, V. P., Siu, K. Y., and Orlitsky, A., editors, Theoretical Advances in Neural Computation and Learning. Kluwer Academic Publishers.

MacKay, 1992
MacKay, D. J. C. (1992).
A practical Bayesian framework for backprop networks.
Neural Computation, 4:448-472.

Martin-Löf, 1966
Martin-Löf, P. (1966).
The definition of random sequences.
Information and Control, 9:602-619.

Milosavljevic and Jurka, 1993
Milosavljevic, A. and Jurka, J. (1993).
Discovery by minimal length encoding: A case study in molecular evolution.
Machine Learning, 12:96-87.

Moody, 1992
Moody, J. E. (1992).
The effective number of parameters: An analysis of generalization and regularization in nonlinear learning systems.
In Lippman, D. S., Moody, J. E., and Touretzky, D. S., editors, Advances in Neural Information Processing Systems 4, pages 847-854. San Mateo, CA: Morgan Kaufmann.

Mozer and Smolensky, 1989
Mozer, M. C. and Smolensky, P. (1989).
Skeletonization: A technique for trimming the fat from a network via relevance assessment.
In Touretzky, D. S., editor, Advances in Neural Information Processing Systems 1, pages 107-115. San Mateo, CA: Morgan Kaufmann.

Nowlan and Hinton, 1992
Nowlan, S. J. and Hinton, G. E. (1992).
Simplifying neural networks by soft weight sharing.
Neural Computation, 4:173-193.

Parker, 1985
Parker, D. B. (1985).
Learning-logic.
Technical Report TR-47, Center for Comp. Research in Economics and Management Sci., MIT.

Paul and Solomonoff, 1991
Paul, W. and Solomonoff, R. J. (1991).
Autonomous theory building systems.
Manuscript, revised 1994.

Pearlmutter and Rosenfeld, 1991
Pearlmutter, B. A. and Rosenfeld, R. (1991).
Chaitin-Kolmogorov complexity and generalization in neural networks.
In Lippman, D. S., Moody, J. E., and Touretzky, D. S., editors, Advances in Neural Information Processing Systems 3, pages 925-931. San Mateo, CA: Morgan Kaufmann.

Pednault, 1989
Pednault, E. P. D. (1989).
Some experiments in applying inductive inference principles to surface reconstruction.
In 11th IJCAI, pages 1603-1609. San Mateo, CA: Morgan Kaufmann.

Quinlan and Rivest, 1989
Quinlan, J. R. and Rivest, R. L. (1989).
Inferring decision trees using the minimum description length principle.
Information and Computation, 80:227-248.

Rissanen, 1978
Rissanen, J. (1978).
Modeling by shortest data description.
Automatica, 14:465-471.

Rissanen, 1983
Rissanen, J. (1983).
A universal prior for integers and estimation by minimum description length.
The Annals of Statistics, 11(2):416-431.

Rissanen, 1986
Rissanen, J. (1986).
Stochastic complexity and modeling.
The Annals of Statistics, 14(3):1080-1100.

Rumelhart et al., 1986
Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986).
Learning internal representations by error propagation.
In Parallel Distributed Processing, volume 1, pages 318-362. MIT Press.

Schaffer, 1993
Schaffer, C. (1993).
Overfitting avoidance as bias.
Machine Learning, 10:153-178.

Schmidhuber, 1989
Schmidhuber, J. (1989).
The Neural Bucket Brigade: A local learning algorithm for dynamic feedforward and recurrent networks.
Connection Science, 1(4):403-412.

Schmidhuber, 1992a
Schmidhuber, J. (1992a).
Learning complex, extended sequences using the principle of history compression.
Neural Computation, 4(2):234-242.

Schmidhuber, 1992b
Schmidhuber, J. (1992b).
Learning factorial codes by predictability minimization.
Neural Computation, 4(6):863-879.

Schmidhuber, 1993a
Schmidhuber, J. (1993a).
On decreasing the ratio between learning complexity and number of time-varying variables in fully recurrent nets.
In Proceedings of the International Conference on Artificial Neural Networks, Amsterdam, pages 460-463. Springer.

Schmidhuber, 1993b
Schmidhuber, J. (1993b).
A self-referential weight matrix.
In Proceedings of the International Conference on Artificial Neural Networks, Amsterdam, pages 446-451. Springer.

Schmidhuber, 1994
Schmidhuber, J. (1994).
Discovering problem solutions with low Kolmogorov complexity and high generalization capability.
Technical Report FKI-194-94, Fakultät für Informatik, Technische Universität München.
Short version in A. Prieditis and S. Russell, eds., Machine Learning: Proceedings of the Twelfth International Conference, Morgan Kaufmann Publishers, pages 488-496, San Francisco, CA, 1995.

Schmidhuber, 1995
Schmidhuber, J. (1995).
Low-complexity art.
Accepted by Leonardo, Journal of the International Society for the Arts, Sciences, and Technology.

Schmidhuber, 1996
Schmidhuber, J. (1996).
A general method for incremental self-improvement and multi-agent learning in unrestricted environments.
In Yao, X., editor, Evolutionary Computation: Theory and Applications. Scientific Publ. Co., Singapore.
In press.

Schmidhuber et al., 1996
Schmidhuber, J., Zhao, J., and Wiering, M. (1996).
Simple principles of metalearning.
Technical Report IDSIA-69-96, IDSIA.

Schnorr, 1971
Schnorr, C. P. (1971).
A unified approach to the definition of random sequences.
Mathematical Systems Theory, 5:246-258.

Shannon, 1948
Shannon, C. E. (1948).
A mathematical theory of communication (parts I and II).
Bell System Technical Journal, XXVII:379-423.

Solomonoff, 1964
Solomonoff, R. (1964).
A formal theory of inductive inference. Part I.
Information and Control, 7:1-22.

Solomonoff, 1986
Solomonoff, R. (1986).
An application of algorithmic probability to problems in artificial intelligence.
In Kanal, L. N. and Lemmer, J. F., editors, Uncertainty in Artificial Intelligence, pages 473-491. Elsevier Science Publishers.

Utgoff, 1986
Utgoff, P. (1986).
Shift of bias for inductive concept learning.
In Michalski, R., Carbonell, J., and Mitchell, T., editors, Machine Learning, volume 2, pages 163-190. Morgan Kaufmann, Los Altos, CA.

Valiant, 1984
Valiant, L. G. (1984).
A theory of the learnable.
Communications of the ACM, 27:1134-1142.

Vapnik, 1992
Vapnik, V. (1992).
Principles of risk minimization for learning theory.
In Lippman, D. S., Moody, J. E., and Touretzky, D. S., editors, Advances in Neural Information Processing Systems 4, pages 831-838. San Mateo, CA: Morgan Kaufmann.

Wallace and Boulton, 1968
Wallace, C. S. and Boulton, D. M. (1968).
An information theoretic measure for classification.
Computer Journal, 11(2):185-194.

Watanabe, 1992
Watanabe, O. (1992).
Kolmogorov complexity and computational complexity.
EATCS Monographs on Theoretical Computer Science, Springer.

Watkins, 1989
Watkins, C. (1989).
Learning from Delayed Rewards.
PhD thesis, King's College London.

Weigend et al., 1990
Weigend, A. S., Huberman, B. A., and Rumelhart, D. E. (1990).
Predicting the future: A connectionist approach.
International Journal of Neural Systems, 1:193-209.

Werbos, 1974
Werbos, P. J. (1974).
Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences.
PhD thesis, Harvard University.

Wiering and Schmidhuber, 1996
Wiering, M. and Schmidhuber, J. (1996).
Solving POMDPs with Levin search and EIRA.
In Saitta, L., editor, Machine Learning: Proceedings of the Thirteenth International Conference, pages 534-542. Morgan Kaufmann Publishers, San Francisco, CA.

Williams, 1988
Williams, R. J. (1988).
Toward a theory of reinforcement-learning connectionist systems.
Technical Report NU-CCS-88-3, College of Comp. Sci., Northeastern University, Boston, MA.

Wolpert, 1993
Wolpert, D. H. (1993).
Technical Report SFI TR 93-03-016, Santa Fe Institute, NM 87501.

Zhao and Schmidhuber, 1996
Zhao, J. and Schmidhuber, J. (1996).
Incremental self-improvement for life-time multi-agent reinforcement learning.
In Maes, P., Mataric, M., Meyer, J.-A., Pollack, J., and Wilson, S. W., editors, From Animals to Animats 4: Proceedings of the Fourth International Conference on Simulation of Adaptive Behavior, Cambridge, MA, pages 516-525. MIT Press, Bradford Books.

Zvonkin and Levin, 1970
Zvonkin, A. K. and Levin, L. A. (1970).
The complexity of finite objects and the algorithmic concepts of information and randomness.
Russian Math. Surveys, 25(6):83-124.



Juergen Schmidhuber 2003-02-12


Back to Optimal Universal Search page
Back to Program Evolution page
Back to Algorithmic Information page
Back to Speed Prior page