1. Bruner, J. S. (1966). Toward a Theory of Instruction. Harvard University Press.
  2. Bruner, J. (1991). The narrative construction of reality. Critical Inquiry, 18(1), 1–21.
  3. Clark, J. M., & Paivio, A. (1991). Dual coding theory and education. Educational Psychology Review, 3(3), 149–210.
  4. Gick, M. L., & Holyoak, K. J. (1983). Schema induction and analogical transfer. Cognitive Psychology, 15(1), 1–38.
  5. Wammes, J. D., Meade, M. E., & Fernandes, M. A. (2016). The drawing effect: Evidence for reliable and robust memory benefits in free recall. Quarterly Journal of Experimental Psychology, 69(9), 1752–1776.
  1. Verhulst, P. F. (1838). Notice sur la loi que la population suit dans son accroissement. Correspondance Mathématique et Physique, 10, 113–121.
  2. Legendre, A. M. (1805). Nouvelles méthodes pour la détermination des orbites des comètes. Firmin Didot.
  3. Gauss, C. F. (1809). Theoria motus corporum coelestium in sectionibus conicis solem ambientium. Perthes and Besser.
  4. McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics, 5, 115–133.
  5. Berkson, J. (1944). Application of the logistic function to bio-assay. Journal of the American Statistical Association, 39(227), 357–365.
  6. Fix, E., & Hodges, J. L. (1951). Discriminatory analysis: Nonparametric discrimination, consistency properties (Report No. 4). USAF School of Aviation Medicine, Randolph Field.
  7. Steinhaus, H. (1956). Sur la division des corps matériels en parties. Bulletin de l’Académie Polonaise des Sciences, 4(12), 801–804.
  8. Bellman, R. (1957). Dynamic Programming. Princeton University Press.
  9. Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organisation in the brain. Psychological Review, 65(6), 386–408.
  10. Cox, D. R. (1958). The regression analysis of binary sequences. Journal of the Royal Statistical Society: Series B, 20(2), 215–242.
  11. Sokal, R. R., & Michener, C. D. (1958). A statistical method for evaluating systematic relationships. University of Kansas Science Bulletin, 38(22), 1409–1438.
  12. Cleverdon, C. W. (1960). Report on the testing and analysis of an investigation into the comparative efficiency of indexing systems. College of Aeronautics, Cranfield.
  13. Hubel, D. H., & Wiesel, T. N. (1962). Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. Journal of Physiology, 160(1), 106–154.
  14. Ward, J. H. (1963). Hierarchical grouping to optimise an objective function. Journal of the American Statistical Association, 58(301), 236–244.
  15. Cover, T. M., & Hart, P. E. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1), 21–27.
  16. MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 1, 281–297.
  17. Mosteller, F., & Tukey, J. W. (1968). Data analysis, including statistics. In G. Lindzey & E. Aronson (Eds.), Handbook of Social Psychology (Vol. 2). Addison-Wesley.
  18. Minsky, M., & Papert, S. (1969). Perceptrons: An Introduction to Computational Geometry. MIT Press.
  19. Werbos, P. J. (1974). Beyond regression: New tools for prediction and analysis in the behavioral sciences (Doctoral dissertation). Harvard University.
  20. Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society: Series B, 36(2), 111–133.
  21. Geisser, S. (1975). The predictive sample reuse method with applications. Journal of the American Statistical Association, 70(350), 320–328.
  22. Posavec, S., & Köstner, I. (2019). Observe, Collect, Draw! A Visual Journal. Princeton Architectural Press.
  23. Galton, F. (1886). Regression towards mediocrity in hereditary stature. Journal of the Anthropological Institute of Great Britain and Ireland, 15, 246–263.
  24. Hopfield, J. J. (1982). Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Sciences, 79(8), 2554–2558.
  25. Lloyd, S. P. (1982). Least squares quantization in PCM. IEEE Transactions on Information Theory, 28(2), 129–137.
  26. Jordan, M. I. (1986). Serial order: A parallel distributed processing approach (Technical Report ICS-8604). University of California, San Diego.
  27. Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106.
  28. Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323, 533–536.
  29. Turing, A. M. (1950). Computing machinery and intelligence. Mind, 59(236), 433–460.
  30. Watkins, C. J. C. H. (1989). Learning from delayed rewards (Doctoral dissertation). University of Cambridge.
  31. Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14(2), 179–211.
  32. Schapire, R. E. (1990). The strength of weak learnability. Machine Learning, 5(2), 197–227.
  33. Burkov, A. (2019). The Hundred-Page Machine Learning Book. Andriy Burkov.
  34. Géron, A. (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow (2nd ed.). O’Reilly Media.
  35. Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer.
  36. Freund, Y., & Schapire, R. E. (1995). A decision-theoretic generalisation of on-line learning and an application to boosting. Proceedings of the Second European Conference on Computational Learning Theory, 23–37.
  37. Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.
  38. Ester, M., Kriegel, H.-P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), 226–231.
  39. Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140.
  40. Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.
  41. Ho, T. K. (1998). The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(8), 832–844.
  42. LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.
  43. Gers, F. A., Schmidhuber, J., & Cummins, F. (1999). Learning to forget: Continual prediction with LSTM. Proceedings of the Ninth International Conference on Artificial Neural Networks (ICANN), 850–855.
  44. Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
  45. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25.
  46. Kingma, D. P., & Welling, M. (2014). Auto-encoding variational Bayes. Proceedings of the Second International Conference on Learning Representations (ICLR).
  47. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. Advances in Neural Information Processing Systems, 27.
  48. Sohl-Dickstein, J., Weiss, E. A., Maheswaranathan, N., & Ganguli, S. (2015). Deep unsupervised learning using nonequilibrium thermodynamics. Proceedings of the 32nd International Conference on Machine Learning (ICML), 2256–2265.
  49. Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T., & Hassabis, D. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529, 484–489.
  50. Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794.
  51. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30.
  52. Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
  53. Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training. OpenAI.
  54. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI.
  55. Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., … Amodei, D. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33.