^ abSilver, David; Huang, Aja; Maddison, Chris J.; Guez, Arthur; Sifre, Laurent; Driessche, George van den; Schrittwieser, Julian; Antonoglou, Ioannis et al. (28 January 2016). “Mastering the game of Go with deep neural networks and tree search”. Nature529 (7587): 484–489. Bibcode: 2016Natur.529..484S. doi:10.1038/nature16961. ISSN0028-0836. PMID26819042.
^Sutton, Richard S.; Barto, Andrew G. (1998). Reinforcement Learning: An Introduction. MIT Press. ISBN978-0262193986
^Lin, Long-Ji; Mitchell, Tom M. (1993). Reinforcement Learning with Hidden States. From Animals to Animats. Vol. 2. pp. 271–280.
^Onat, Ahmet; Kita, Hajime (1998). Q-learning with Recurrent Neural Networks as a Controller for the Inverted Pendulum Problem. The 5th International Conference on Neural Information Processing (ICONIP). pp. 837–840.
^Onat, Ahmet; Kita, Hajime (1998). Recurrent Neural Networks for Reinforcement Learning: Architecture, Learning Algorithms and Internal Representation. International Joint Conference on Neural Networks (IJCNN). pp. 2010–2015. doi:10.1109/IJCNN.1998.687168。
^Shibata, Katsunari (7 March 2017). "Functions that Emerge through End-to-End Reinforcement Learning". arXiv:1703.02239 [cs.AI]。
^Shibata, Katsunari (10 March 2017). "Communications that Emerge through Reinforcement Learning Using a (Recurrent) Neural Network". arXiv:1703.03543 [cs.AI]。