Bibliography

Next: About this document ... Up: DISCOVERING NEURAL NETS WITH Previous: ACKNOWLEDGEMENTS

Bibliography

Adleman, 1979

Adleman, L. (1979).
Time, space, and randomness.
Technical Report MIT/LCS/79/TM-131, Laboratory for Computer Science, MIT.

Allender, 1992

Allender, A. (1992).
Application of time-bounded Kolmogorov complexity in complexity theory.
In Watanabe, O., editor, Kolmogorov complexity and computational complexity, pages 6-22. EATCS Monographs on Theoretical Computer Science, Springer.

Amari and Murata, 1993

Amari, S. and Murata, N. (1993).
Statistical theory of learning curves under entropic loss criterion.
Neural Computation, 5(1):140-153.

Atick et al., 1992

Atick, J. J., Li, Z., and Redlich, A. N. (1992).
Understanding retinal color coding from first principles.
Neural Computation, 4:559-572.

Barlow, 1989

Barlow, H. B. (1989).
Unsupervised learning.
Neural Computation, 1(3):295-311.

Barron, 1988

Barron, A. R. (1988).
Complexity regularization with application to artificial neural networks.
In Nonparametric Functional Estimation and Related Topics, pages 561-576. Kluwer Academic Publishers.

Barto, 1989

Barto, A. G. (1989).
Connectionist approaches for control.
Technical Report COINS 89-89, University of Massachusetts, Amherst MA 01003.

Barto et al., 1983

Barto, A. G., Sutton, R. S., and Anderson, C. W. (1983).
Neuronlike adaptive elements that can solve difficult learning control problems.
IEEE Transactions on Systems, Man, and Cybernetics, SMC-13:834-846.

Barzdin, 1988

Barzdin, Y. M. (1988).
Algorithmic information theory.
In Reidel, D., editor, Encyclopaedia of Mathematics, volume 1, pages 140-142. Kluwer Academic Publishers.

Baum and Haussler, 1989

Baum, E. B. and Haussler, D. (1989).
What size net gives valid generalization?
Neural Computation, 1(1):151-160.

Becker, 1991

Becker, S. (1991).
Unsupervised learning procedures for neural networks.
International Journal of Neural Systems, 2(1 & 2):17-33.

Bennett, 1988

Bennett, C. H. (1988).
Logical depth and physical complexity.
In The Universal Turing Machine: A Half Century Survey, volume 1, pages 227-258. Oxford University Press, Oxford and Kammerer & Unverzagt, Hamburg.

Blumer et al., 1987

Blumer, A., Ehrenfeucht, A., Haussler, D., and Warmuth, M. K. (1987).
Occam's razor.
Information Processing Letters, 24:377-380.

Chaitin, 1966

Chaitin, G. (1966).
On the length of programs for computing finite binary sequences.
Journal of the ACM, 13:547-569.

Chaitin, 1969

Chaitin, G. (1969).
On the length of programs for computing finite binary sequences: statistical considerations.
Journal of the ACM, 16:145-159.

Chaitin, 1975

Chaitin, G. (1975).
A theory of program size formally identical to information theory.
Journal of the ACM, 22:329-340.

Chaitin, 1987

Chaitin, G. (1987).
Algorithmic Information Theory.
Cambridge University Press, Cambridge.

Cover et al., 1989

Cover, T. M., Gács, P., and Gray, R. M. (1989).
Kolmogorov's contributions to information theory and algorithmic complexity.
Annals of Probability Theory, 17:840-865.

Dayan and Sejnowski, 1994

Dayan, P. and Sejnowski, T. (1994).
TD $(\lambda)$ : Convergence with probability

.
Machine Learning, 14:295-301.

Deco et al., 1993

Deco, G., Finnoff, W., and Zimmermann, H. G. (1993).
Elimination of overtraining by a mutual information network.
In Proceedings of the International Conference on Artificial Neural Networks, Amsterdam, pages 744-749. Springer.

Dietterich, 1989

Dietterich, T. G. (1989).
Limitations of inductive learning.
In Proceedings of the Sixth International Workshop on Machine Learning, Ithaca, NY, pages 124-128. San Francisco, CA: Morgan Kaufmann.

Gács, 1974

Gács, P. (1974).
On the symmetry of algorithmic information.
Soviet Math. Dokl., 15:1477-1480.

Gallant, 1990

Gallant, S. I. (1990).
A connectionist learning algorithm with provable generalization and scaling bounds.
Neural Networks, 3:191-201.

Gao and Li, 1989

Gao, Q. and Li, M. (1989).
The minimum description length principle and its application to online learning of handprinted characters.
In Proc. 11th IEEE International Joint Conference on Artificial Intelligence, Detroit, Mi, pages 843-848.

Guyon et al., 1992

Guyon, I., Vapnik, V., Boser, B., Bottou, L., and Solla, S. A. (1992).
Structural risk minimization for character recognition.
In Lippman, D. S., Moody, J. E., and Touretzky, D. S., editors, Advances in Neural Information Processing Systems 4, pages 471-479. San Mateo, CA: Morgan Kaufmann.

Hartmanis, 1983

Hartmanis, J. (1983).
Generalized Kolmogorov complexity and the structure of feasible computations.
In Proc. 24th IEEE Symposium on Foundations of Computer Science, pages 439-445.

Hassibi and Stork, 1993

Hassibi, B. and Stork, D. G. (1993).
Second order derivatives for network pruning: Optimal brain surgeon.
In Lippman, D. S., Moody, J. E., and Touretzky, D. S., editors, Advances in Neural Information Processing Systems 5, pages 164-171. San Mateo, CA: Morgan Kaufmann.

Haussler, 1988

Haussler, D. (1988).
Quantifying inductive bias: AI learning algorithms and Valiant's learning framework.
Artificial Intelligence, 36:177-221.

Heil, 1995

Heil, S. (1995).
Universelle Suche und inkrementelles Lernen, diploma thesis.
Fakultät für Informatik, Lehrstuhl Prof. Brauer, Technische Universität München.

Hinton and van Camp, 1993

Hinton, G. E. and van Camp, D. (1993).
Keeping neural networks simple.
In Proceedings of the International Conference on Artificial Neural Networks, Amsterdam, pages 11-18. Springer.

Hochreiter and Schmidhuber, 1995

Hochreiter, S. and Schmidhuber, J. (1995).
Simplifying neural nets by discovering flat minima.
In Tesauro, G., Touretzky, D. S., and Leen, T. K., editors, Advances in Neural Information Processing Systems 7, pages 529-536. MIT Press, Cambridge MA.

Hochreiter and Schmidhuber, 1996

Hochreiter, S. and Schmidhuber, J. (1996).
Flat minima.
Neural Computation.
In press. Extended version available in WWW homepages of Hochreiter and Schmidhuber.

Kolmogorov, 1965

Kolmogorov, A. (1965).
Three approaches to the quantitative definition of information.
Problems of Information Transmission, 1:1-11.

Kolmogorov, 1933

Kolmogorov, A. N. (1933).
Grundbegriffe der Wahrscheinlichkeitsrechnung.
Springer, Berlin.

Krogh and Hertz, 1992

Krogh, A. and Hertz, J. A. (1992).
A simple weight decay can improve generalization.
In Lippman, D. S., Moody, J. E., and Touretzky, D. S., editors, Advances in Neural Information Processing Systems 4, pages 950-957. San Mateo, CA: Morgan Kaufmann.

LeCun, 1985

LeCun, Y. (1985).
Une procédure d'apprentissage pour réseau à seuil asymétrique.
Proceedings of Cognitiva 85, Paris, pages 599-604.

LeCun et al., 1991

LeCun, Y., Kanter, I., and Solla, S. A. (1991).
Second order properties of error surfaces: Learning time and generalization.
In Lippman, D. S., Moody, J. E., and Touretzky, D. S., editors, Advances in Neural Information Processing Systems 3, pages 918-924. San Mateo, CA: Morgan Kaufmann.

Levin, 1973a

Levin, L. A. (1973a).
On the notion of a random sequence.
Soviet Math. Dokl., 14(5):1413-1416.

Levin, 1973b

Levin, L. A. (1973b).
Universal sequential search problems.
Problems of Information Transmission, 9(3):265-266.

Levin, 1974

Levin, L. A. (1974).
Laws of information (nongrowth) and aspects of the foundation of probability theory.
Problems of Information Transmission, 10(3):206-210.

Levin, 1976

Levin, L. A. (1976).
Various measures of complexity for finite objects (axiomatic description).
Soviet Math. Dokl., 17(2):522-526.

Levin, 1984

Levin, L. A. (1984).
Randomness conservation inequalities: Information and independence in mathematical theories.
Information and Control, 61:15-37.

Li and Vitányi, 1989

Li, M. and Vitányi, P. M. B. (1989).
A theory of learning simple concepts under simple distributions and average case complexity for the universal distribution.
In Proc. 30th American IEEE Symposium on Foundations of Computer Science, pages 34-39.

Li and Vitányi, 1993

Li, M. and Vitányi, P. M. B. (1993).
An Introduction to Kolmogorov Complexity and its Applications.
Springer.

Linsker, 1988

Linsker, R. (1988).
Self-organization in a perceptual network.
IEEE Computer, 21:105-117.

Maass, 1994

Maass, W. (1994).
Perspectives of current research about the complexity of learning on neural nets.
In Roychowdhury, V. P., Siu, K. Y., and Orlitsky, A., editors, Theoretical Advances in Neural Computation and Learning. Kluwer Academic Publishers.

MacKay, 1992

MacKay, D. J. C. (1992).
A practical Bayesian framework for backprop networks.
Neural Computation, 4:448-472.

Martin-Löf, 1966

Martin-Löf, P. (1966).
The definition of random sequences.
Information and Control, 9:602-619.

Milosavljevic and Jurka, 1993

Milosavljevic, A. and Jurka, J. (1993).
Discovery by minimal length encoding: A case study in molecular evolution.
Machine Learning, 12:96-87.

Moody, 1992

Moody, J. E. (1992).
The effective number of parameters: An analysis of generalization and regularization in nonlinear learning systems.
In Lippman, D. S., Moody, J. E., and Touretzky, D. S., editors, Advances in Neural Information Processing Systems 4, pages 847-854. San Mateo, CA: Morgan Kaufmann.

Mozer and Smolensky, 1989

Mozer, M. C. and Smolensky, P. (1989).
Skeletonization: A technique for trimming the fat from a network via relevance assessment.
In Touretzky, D. S., editor, Advances in Neural Information Processing Systems 1, pages 107-115. San Mateo, CA: Morgan Kaufmann.

Nowlan and Hinton, 1992

Nowlan, S. J. and Hinton, G. E. (1992).
Simplifying neural networks by soft weight sharing.
Neural Computation, 4:173-193.

Parker, 1985

Parker, D. B. (1985).
Learning-logic.
Technical Report TR-47, Center for Comp. Research in Economics and Management Sci., MIT.

Paul and Solomonoff, 1991

Paul, W. and Solomonoff, R. J. (1991).
Autonomous theory building systems.
Manuscript, revised 1994.

Pearlmutter and Rosenfeld, 1991

Pearlmutter, B. A. and Rosenfeld, R. (1991).
Chaitin-Kolmogorov complexity and generalization in neural networks.
In Lippman, D. S., Moody, J. E., and Touretzky, D. S., editors, Advances in Neural Information Processing Systems 3, pages 925-931. San Mateo, CA: Morgan Kaufmann.

Pednault, 1989

Pednault, E. P. D. (1989).
Some experiments in applying inductive inference principles to surface reconstruction.
In 11th IJCAI, pages 1603-1609. San Mateo, CA: Morgan Kaufmann.

Quinlan and Rivest, 1989

Quinlan, J. R. and Rivest, R. L. (1989).
Inferring decision trees using the minimum description length principle.
Information and Computation, 80:227-248.

Rissanen, 1978

Rissanen, J. (1978).
Modeling by shortest data description.
Automatica, 14:465-471.

Rissanen, 1983

Rissanen, J. (1983).
A universal prior for integers and estimation by minimum description length.
The Annals of Statistics, 11(2):416-431.

Rissanen, 1986

Rissanen, J. (1986).
Stochastic complexity and modeling.
The Annals of Statistics, 14(3):1080-1100.

Rumelhart et al., 1986

Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986).
Learning internal representations by error propagation.
In Parallel Distributed Processing, volume 1, pages 318-362. MIT Press.

Schaffer, 1993

Schaffer, C. (1993).
Overfitting avoidance as bias.
Machine Learning, 10:153-178.

Schmidhuber, 1989

Schmidhuber, J. (1989).
The Neural Bucket Brigade: A local learning algorithm for dynamic feedforward and recurrent networks.
Connection Science, 1(4):403-412.

Schmidhuber, 1992a

Schmidhuber, J. (1992a).
Learning complex, extended sequences using the principle of history compression.
Neural Computation, 4(2):234-242.

Schmidhuber, 1992b

Schmidhuber, J. (1992b).
Learning factorial codes by predictability minimization.
Neural Computation, 4(6):863-879.

Schmidhuber, 1993a

Schmidhuber, J. (1993a).
On decreasing the ratio between learning complexity and number of time-varying variables in fully recurrent nets.
In Proceedings of the International Conference on Artificial Neural Networks, Amsterdam, pages 460-463. Springer.

Schmidhuber, 1993b

Schmidhuber, J. (1993b).
A self-referential weight matrix.
In Proceedings of the International Conference on Artificial Neural Networks, Amsterdam, pages 446-451. Springer.

Schmidhuber, 1994

Schmidhuber, J. (1994).
Discovering problem solutions with low Kolmogorov complexity and high generalization capability.
Technical Report FKI-194-94, Fakultät für Informatik, Technische Universität München.
Short version in A. Prieditis and S. Russell, eds., Machine Learning: Proceedings of the Twelfth International Conference, Morgan Kaufmann Publishers, pages 488-496, San Francisco, CA, 1995.

Schmidhuber, 1995

Schmidhuber, J. (1995).
Low-complexity art.
Accepted by Leonardo, Journal of the International Society for the Arts, Sciences, and Technology.

Schmidhuber, 1996

Schmidhuber, J. (1996).
A general method for incremental self-improvement and multi-agent learning in unrestricted environments.
In Yao, X., editor, Evolutionary Computation: Theory and Applications. Scientific Publ. Co., Singapore.
In press.

Schmidhuber et al., 1996

Schmidhuber, J., Zhao, J., and Wiering, M. (1996).
Simple principles of metalearning.
Technical Report IDSIA-69-96, IDSIA.

Schnorr, 1971

Schnorr, C. P. (1971).
A unified approach to the definition of random sequences.
Mathematical Systems Theory, 5:246-258.

Shannon, 1948

Shannon, C. E. (1948).
A mathematical theory of communication (parts I and II).
Bell System Technical Journal, XXVII:379-423.

Solomonoff, 1964

Solomonoff, R. (1964).
A formal theory of inductive inference. Part I.
Information and Control, 7:1-22.

Solomonoff, 1986

Solomonoff, R. (1986).
An application of algorithmic probability to problems in artificial intelligence.
In Kanal, L. N. and Lemmer, J. F., editors, Uncertainty in Artificial Intelligence, pages 473-491. Elsevier Science Publishers.

Utgoff, 1986

Utgoff, P. (1986).
Shift of bias for inductive concept learning.
In Michalski, R., Carbonell, J., and Mitchell, T., editors, Machine Learning, volume 2, pages 163-190. Morgan Kaufmann, Los Altos, CA.

Valiant, 1984

Valiant, L. G. (1984).
A theory of the learnable.
Communications of the ACM, 27:1134-1142.

Vapnik, 1992

Vapnik, V. (1992).
Principles of risk minimization for learning theory.
In Lippman, D. S., Moody, J. E., and Touretzky, D. S., editors, Advances in Neural Information Processing Systems 4, pages 831-838. San Mateo, CA: Morgan Kaufmann.

Wallace and Boulton, 1968

Wallace, C. S. and Boulton, D. M. (1968).
An information theoretic measure for classification.
Computer Journal, 11(2):185-194.

Watanabe, 1992

Watanabe, O. (1992).
Kolmogorov complexity and computational complexity.
EATCS Monographs on Theoretical Computer Science, Springer.

Watkins, 1989

Watkins, C. (1989).
Learning from Delayed Rewards.
PhD thesis, King's College London.

Weigend et al., 1990

Weigend, A. S., Huberman, B. A., and Rumelhart, D. E. (1990).
Predicting the future: A connectionist approach.
International Journal of Neural Systems, 1:193-209.

Werbos, 1974

Werbos, P. J. (1974).
Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences.
PhD thesis, Harvard University.

Wiering and Schmidhuber, 1996

Wiering, M. and Schmidhuber, J. (1996).
Solving POMDPs with Levin search and EIRA.
In Saitta, L., editor, Machine Learning: Proceedings of the Thirteenth International Conference, pages 534-542. Morgan Kaufmann Publishers, San Francisco, CA.

Williams, 1988

Williams, R. J. (1988).
Toward a theory of reinforcement-learning connectionist systems.
Technical Report NU-CCS-88-3, College of Comp. Sci., Northeastern University, Boston, MA.

Wolpert, 1993

Wolpert, D. H. (1993).
Technical Report SFI TR 93-03-016, Santa Fe Institute, NM 87501.

Zhao and Schmidhuber, 1996

Zhao, J. and Schmidhuber, J. (1996).
Incremental self-improvement for life-time multi-agent reinforcement learning.
In Maes, P., Mataric, M., Meyer, J.-A., Pollack, J., and Wilson, S. W., editors, From Animals to Animats 4: Proceedings of the Fourth International Conference on Simulation of Adaptive Behavior, Cambridge, MA, pages 516-525. MIT Press, Bradford Books.

Zvonkin and Levin, 1970

Zvonkin, A. K. and Levin, L. A. (1970).
The complexity of finite objects and the algorithmic concepts of information and randomness.
Russian Math. Surveys, 25(6):83-124.

Juergen Schmidhuber 2003-02-12

Back to Optimal Universal Search page
Back to Program Evolution page
Back to Algorithmic Information page
Back to Speed Prior page