# Foundational Papers in Neural Networks Also see [Semantic Word Embeddings / Word Vectors](https://hackmd.io/@adaburrows/B1_cPiBt6) and [Software for Artificial Neural Networks](https://hackmd.io/@adaburrows/BkhcCiNY6). ## Background and Overview - [MacKay, D. J., & Mac Kay, D. J. (2003). Information theory, inference and learning algorithms. Cambridge university press.](https://www.inference.org.uk/itprnn/book.pdf) — a pretty good overview of the information theory needed to build an understanding of artificial neural networks. Also has content about various neural networks. ## Historical Papers For a more complete list, see https://people.idsia.ch/~juergen/deep-learning-history.html - **[Spin-glass](https://en.wikipedia.org/w/index.php?title=Spin_glass&oldid=1089309646) and [Ising Model](https://en.wikipedia.org/w/index.php?title=Ising_model&oldid=1087414113)** (non-learning RNN, from physics) - W. Lenz (1920). Beitrag zum Verständnis der magnetischen Erscheinungen in festen Körpern. Physikalische Zeitschrift, 21:613-615. - E. Ising (1925). Beitrag zur Theorie des Ferro- und Paramagnetismus. Dissertation, 1924. - E. Ising (1925). Beitrag zur Theorie des Ferromagnetismus. Z. Phys., 31 (1): 253-258, 1925. - H. A. Kramers and G. H. Wannier (1941). Statistics of the Two-Dimensional Ferromagnet. Phys. Rev. 60, 252 and 263, 1941. - G. H. Wannier (1945). The Statistical Problem in Cooperative Phenomena. Rev. Mod. Phys. 17, 50. - [Brush, S. G. (1967). History of the Lenz-Ising model. Reviews of modern physics, 39(4), 883.](https://web.archive.org/web/20221208092414/http://personal.rhul.ac.uk/uhap/027/ph4211/PH4211_files/brush67.pdf) - [Little, W. A. (1974). The existence of persistent states in the brain. Mathematical biosciences, 19(1-2), 101-120.](https://web.archive.org/web/20230815031215/http://wexler.free.fr/library/files/little%20(1974)%20the%20existence%20of%20persistent%20states%20in%20the%20brain.pdf) - [Little, W. A. (1980). An ising model of a neural network. In Biological Growth and Spread (pp. 173-179). Springer, Berlin, Heidelberg.](https://doi.org/10.1007/978-3-642-61850-5_18) - [Amit, Daniel J., Hanoch Gutfreund, and Haim Sompolinsky. "Spin-glass models of neural networks." Physical Review A 32.2 (1985): 1007.](https://www.researchgate.net/publication/283617465_Spin-glass_models_of_neural_networks) - [van Hemmen, J. L. (1986). Spin-glass models of a neural network. Physical Review A, 34(4), 3435.](https://doi.org/10.1103/PhysRevA.34.3435) - [Gardner, E., & Derrida, B. (1988). Optimal storage properties of neural network models. Journal of Physics A: Mathematical and general, 21(1), 271.](https://hal.archives-ouvertes.fr/hal-03285587/file/Optimal%20storage%20properties%20of%20neural%20network%20models.pdf) - [Sherrington, D. (1993). Neural networks: the spin glass approach. In North-Holland Mathematical Library (Vol. 51, pp. 261-291). Elsevier.](https://web.archive.org/web/20231229080726/https://d1wqtxts1xzle7.cloudfront.net/43229638/On-Line_Learning_Processes_in_Artificial20160301-1491-l4cs33-libre.pdf?1456822244=&response-content-disposition=inline%3B+filename%3DOn_line_learning_processes_in_artificial.pdf&Expires=1703840752&Signature=Ch31UDVvbVhObMEIm6S--bKnaRILBrdw6uMgMHhkWaBS04EMkf4c7~o52K7MCzF6y8IzZQgeWVU4xxwouCV7qGQHhdHMRIN5dUsHpERepfYXh3ugRBm77Rcs3BMnyFkoFNX2NIUXiTXyZEFGKgN882bYggTIOn8HcDo4lsDoTjA8fVO53qK~W4DID4UjB0WGDTwfsS4xUmAPtP3B3A3wL9olzDMyYiVYCqs~WnoxWF0ZmZD~lX51LgD4K8x35J9qOKPKf7MMqljD6mqkcXrx1ZBfvxuq66PDsCHMZhxU12jzYGjEeEpzmJXcP8KFVkhG8OJvB7ty2TpSH9x2aO-eag__&Key-Pair-Id=APKAJLOHF5GGSLRBV4ZA#page=270) - [Schaap, H. G. (2005). Ising models and neural networks. University Library Groningen[Host].](http://www.yaroslavvb.com/papers/schaap-ising.pdf) - [Niss, M. (2005). History of the Lenz-Ising model 1920–1950: from ferromagnetic to cooperative phenomena. Archive for history of exact sciences, 59, 267-318.](https://web.archive.org/web/20211022100509/https://verga.cpt.univ-mrs.fr/pdfs/Niss-2005.pdf) - [Schneidman, E., Berry, M. J., Segev, R., & Bialek, W. (2006). Weak pairwise correlations imply strongly correlated network states in a neural population. Nature, 440(7087), 1007-1012.](https://doi.org/10.1038/nature04701) - [Roudi, Y., Tyrcha, J., & Hertz, J. (2009). Ising model for neural data: model quality and approximate methods for extracting functional connectivity. Physical Review E, 79(5), 051915.](https://doi.org/10.1103/physreve.79.051915) - [Yoshioka, M. (2009). Learning of spatiotemporal patterns in Ising-spin neural networks: analysis of storage capacity by path integral methods. Physical review letters, 102(15), 158102.](https://doi.org/10.1103/physrevlett.102.158102) - [Witoelar, A., & Roudi, Y. (2011). Neural network reconstruction using kinetic Ising models with memory. BMC Neuroscience, 12(1), 1-2.](https://doi.org/10.1186/1471-2202-12-S1-P274) - [Pai, S. (2016) Convolutional Neural Networks Arise From Ising Models and Restricted Boltzmann Machines. Stanford University, APPPHYS 293 Term Paper. 7 June.](https://web.stanford.edu/~sunilpai/convolutional-neural-networks.pdf) - [Yamamoto, Y., Aihara, K., Leleu, T., Kawarabayashi, K. I., Kako, S., Fejer, M., ... & Takesue, H. (2017). Coherent Ising machines—Optical neural networks operating at the quantum limit. npj Quantum Information, 3(1), 1-15.](https://doi.org/10.1038/s41534-017-0048-9) - [Morningstar, A., & Melko, R. G. (2018). Deep Learning the Ising Model Near Criticality. Journal of Machine Learning Research, 18(163), 1-17.](https://dl.acm.org/doi/pdf/10.5555/3122009.3242020) - [Efthymiou, S., Beach, M. J., & Melko, R. G. (2019). Super-resolving the Ising model with convolutional neural networks. Physical Review B, 99(7), 075113.](https://doi.org/10.1103/PhysRevB.99.075113) - [Talalaev, D. V. (2020, October). Hopfield neural network and anisotropic Ising model. In International Conference on Neuroinformatics (pp. 381-386). Springer, Cham.](https://doi.org/10.1007/978-3-030-60577-3_45) - [D'Angelo, F., & Böttcher, L. (2020). Learning the Ising model with generative neural networks. arXiv preprint arXiv:2001.05361.](https://doi.org/10.1103/PhysRevResearch.2.023266) - [Aguilera, M., Moosavi, S. A., & Shimazaki, H. (2021). A unifying framework for mean-field theories of asymmetric kinetic Ising systems. Nature communications, 12(1), 1-12.](https://doi.org/10.1038/s41467-021-20890-5) - [Kara, O., Sehanobish, A., & Corzo, H. H. (2021). Fine-tuning Vision Transformers for the Prediction of State Variables in Ising Models. arXiv preprint arXiv:2109.13925.](https://arxiv.org/abs/2109.13925) - [Zhang, Y. (2021). Ising spin configurations with the deep learning method. Journal of Physics Communications, 5(1), 015006.](https://doi.org/10.1088/2399-6528/abd7c3) - **Considered first neural net model** — [McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. Bulletin of mathematical biophysics, 5.](https://www.cs.cmu.edu/~./epxing/Class/10715/reading/McCulloch.and.Pitts.pdf) - [S.C. Kleene. Representation of Events in Nerve Nets and Finite Automata. Automata Studies, Editors: C.E. Shannon and J. McCarthy, Princeton University Press, p. 3-42, Princeton, N.J., 1956.](https://web.archive.org/web/20221207125436/https://apps.dtic.mil/sti/pdfs/ADA596138.pdf) - [Von Neumann, J. (1956). Probabilistic logics and the synthesis of reliable organisms from unreliable components. Automata studies, 34, 43-98.](https://web.archive.org/web/20231128070228/https://personalpages.manchester.ac.uk/staff/nikolaos.kyparissas/uploads/VonNeumann1956.pdf) - Also see [Von Neumann, J., & Pierce, R. S. (1952). Lectures on probabilistic logics and the synthesis of reliable organisms from unreliable components. Pasadena, CA, USA: California institute of technology.](https://web.archive.org/web/20230623162146/https://web.mit.edu/6.454/www/papers/pierce_1952.pdf) - **Perceptron** - Rosenblatt, F. (1957). The perceptron, a perceiving and recognizing automaton Project Para. Cornell Aeronautical Laboratory. - Rosenblatt, F. The perceptron: A theory of statistical separability in cognitive systems. Buffalo: Cornell Aeronautical Laboratory, Inc. Rep. No. VG-1196-G-1, 1958. - [Rosenblatt, F. (1958). The perceptron: a probabilistic model for information storage and organization in the brain. Psychological review, 65(6), 386.](https://web.archive.org/web/20230506225610/http://www2.denizyuret.com/bib/rosenblatt/Rosenblatt1958/frosenblatt.pdf) - Joseph, R. D. (1961). Contributions to perceptron theory. PhD thesis, Cornell Univ. - K. Steinbuch. Die Lernmatrix. (The learning matrix.) Kybernetik, 1(1):36-45, 1961. - [Rosenblatt, F. (1962). Perceptions and the theory of brain mechanisms. Spartan books.](https://apps.dtic.mil/sti/pdfs/AD0256582.pdf) - [Bourlard, H., & Kamp, Y. (1988). Auto-association by multilayer perceptrons and singular value decomposition. Biological cybernetics, 59(4-5), 291-294.](https://infoscience.epfl.ch/record/82601) - [Anlauf, J. K., & Biehl, M. (1989). The adatron: an adaptive perceptron algorithm. Europhysics Letters, 10(7), 687.](https://iopscience.iop.org/article/10.1209/0295-5075/10/7/014) - [Gallant, S. I. (1990). Perceptron-based learning algorithms. IEEE Transactions on neural networks, 1(2), 179-191.](https://www.ling.upenn.edu/courses/cogs501/Gallant1990.pdf) - [Amaldi, E. (1991). On the complexity of training perceptrons. In Artificial Neural Networks: Proceedings of the 1991 International Conference on Artificial Neural Networks (ICANN'91) (Vol. 1, pp. 55-60). North-Holland.](https://hdl.handle.net/11311/677346) - [Frean, M. (1992). A" thermal" perceptron learning rule. Neural Computation, 4(6), 946-957.](https://doi.org/10.1162/neco.1992.4.6.946) - [Wendemuth, A. (1995). Learning the unlearnable. Journal of Physics A: Mathematical and General, 28(18), 5423.](https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=f5f164693bb2d9646619c912f51bce18bcab2a78) - [Freund, Y., & Schapire, R. E. (1998, July). Large margin classification using the perceptron algorithm. In Proceedings of the eleventh annual conference on Computational learning theory (pp. 209-217).](https://cseweb.ucsd.edu/~yfreund/papers/LargeMarginsUsingPerceptron.pdf) - [Frie, T. T., Cristianini, N., & Campbell, C. (1998, July). The kernel-adatron algorithm: a fast and simple learning procedure for support vector machines. In Machine learning: proceedings of the fifteenth international conference (ICML'98) (pp. 188-196).](https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=5e761bc3b6028308dcd48f9ba0964533c2e6fe43) - [Collins, M. (2002, July). Discriminative training methods for hidden markov models: Theory and experiments with perceptron algorithms. In Proceedings of the 2002 conference on empirical methods in natural language processing (EMNLP 2002) (pp. 1-8).](https://aclanthology.org/W02-1001.pdf) - p-delta rule — [Auer, P., Burgsteiner, H., & Maass, W. (2008). A learning rule for very simple universal approximators consisting of a single layer of perceptrons. Neural networks, 21(5), 786-795.](https://web.archive.org/web/20110706095227/http://www.igi.tugraz.at/harry/psfiles/biopdelta-07.pdf) - **Adaline/Madaline** - [Widrow, B. (1957). Propagation of statistics in systems. In IRE WESCON Convention Record, Part (Vol. 2, pp. 114-121).](https://web.archive.org/web/20230606185240/https://www-isl.stanford.edu/~widrow/papers/c1957propagationof.pdf) - [Widrow, B. (1959). Adaptive sample data systems-a statistical theory of adaptation. 1959 WESCON convention record. Part, 4, 74-85.](https://web.archive.org/web/20230606185325/https://www-isl.stanford.edu/~widrow/papers/c1959adaptivesampled.pdf) - Mattson, R. L. (1959). The design and analysis of an adaptive system for statistical classification (Doctoral dissertation, Massachusetts Institute of Technology, Department of Electrical Engineering). - [Mattson, R. L. (1959). A Self-Organizing Binary System. Eastern Joint Computer Conference Record. Institute for Research and Education. New-York.](https://web.archive.org/web/20230816121218/http://www.bitsavers.org/pdf/afips/1959-12_%2316.pdf#212) - [Widrow, B. (1960, July). Adaptive sampled-data systems. In Proceedings of the First International Congress of the International Federation of Automatic Control (pp. 406-411).](https://web.archive.org/web/20100802044733/https://www-isl.stanford.edu/people/widrow/papers/c1960adaptivesampled.pdf) - [Widrow, B., & Hoff, M. E. (1960, August). Adaptive switching circuits. In IRE WESCON convention record (Vol. 4, No. 1, pp. 96-104).](https://web.archive.org/web/20220419145429/https://isl.stanford.edu/~widrow/papers/c1960adaptiveswitching.pdf) - [Widrow, B. (1960). An Adaptive Adaline Neuron Using Chemical Memristors. Technical Report.](https://www-isl.stanford.edu/~widrow/papers/t1960anadaptive.pdf) - Widrow, B. (1962). Generalization and information storage in networks of adaline neurons. Self-organizing systems, 435-461. - [Winter, C. R., & Widrow, B. (1988, July). MADALINE RULE II: A training algorithm for neural networks. In Second Annual International Conference on Neural Networks (pp. 1-401).](https://www-isl.stanford.edu/~widrow/papers/c1988madalinerule.pdf) - [Widrow, B., & Lehr, M. A. (1990). 30 years of adaptive neural networks: perceptron, madaline, and backpropagation. Proceedings of the IEEE, 78(9), 1415-1442.](https://www-isl.stanford.edu/people/widrow/papers/j199030years.pdf) - [Hubel, D. H., & Wiesel, T. N. (1959). Receptive fields of single neurones in the cat's striate cortex. The Journal of physiology, 148(3), 574.](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1363130/) - [Cover, T. M. (1965). Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE transactions on electronic computers, (3), 326-334.](https://www-isl.stanford.edu/people/cover/papers/paper2.pdf) — goes over some of the geometrical connections applicable to machine learning - [Amari, S. (1967). A theory of adaptive pattern classifiers. IEEE Transactions on Electronic Computers, (3), 299-307.](https://web.archive.org/web/20231216231715/https://people.idsia.ch/~juergen/amari1967.pdf) - [S. Grossberg. Some networks that can learn, remember, and reproduce any number of complicated space-time patterns, Indiana University Journal of Mathematics and Mechanics, 19:53-91, 1969.](https://www.jstor.org/stable/24902110) - D. Marr. Simple memory: a theory for archicortex. Philos Trans R Soc Lond B Biol Sci, 262:841, p 23-81, 1971. - **Self-Organizing Maps (SOM)** - T. Kohonen. Correlation Matrix Memories. IEEE Transactions on Computers, C-21, p. 353-359, 1972. - [Kohonen, T. (1982). Self-organized formation of topologically correct feature maps. Biological cybernetics, 43(1), 59-69.](https://www.cnbc.cmu.edu/~tai/nc19journalclubs/Kohonen1982_Article_Self-organizedFormationOfTopol.pdf) - T. Kohonen. Self-Organization and Associative Memory. Springer, second edition, 1988. - H. Ritter, T. Kohonen. Self-organizing semantic maps. Biological Cybernetics, 61(4):241-254, 1989. - [Fort, J. C. (2006). SOM's mathematics. Neural Networks, 19(6-7), 812-816.](http://samos.univ-paris1.fr/archives/WSOM05/papers/WSOM2005-128.pdf) - K. Nakano. Associatron—A Model of Associative Memory. IEEE Transactions on Systems, Man, and Cybernetics, SMC-2:3 p. 380-388, 1972. - **Amari-Hopfield Net** - [S. I. Amari (1972). Learning patterns and pattern sequences by self-organizing nets of threshold elements. IEEE Transactions, C 21, 1197-1206, 1972.](https://web.archive.org/web/20231219144029/https://people.idsia.ch/~juergen/amari1972hopfield.pdf) - [Hopfield, J. J. (1982). Neural networks and physical systems with emergent collective computational abilities. Proceedings of the national academy of sciences, 79(8), 2554-2558.](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC346238/) - [Hopfield, J. J. (1984). Neurons with graded response have collective computational properties like those of two-state neurons. Proceedings of the national academy of sciences, 81(10), 3088-3092.](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC345226/) - [Hopfield, J. J., & Tank, D. W. (1985). "Neural" computation of decisions in optimization problems. Biological cybernetics, 52(3), 141-152.](https://axon.cs.byu.edu/~martinez/classes/778/Papers/hopfield_tank.pdf) - [Storkey, A. J., & Valabregue, R. (1999). The basins of attraction of a new Hopfield learning rule. Neural Networks, 12(6), 869-876.](https://citeseerx.ist.psu.edu/doc/10.1.1.19.4681) - [A. P. Millan, J. J. Torres, J. Marro. How Memory Conforms to Brain Development. Front. Comput. Neuroscience, 2019](https://doi.org/10.3389/fncom.2019.00022) - [Krotov, D., & Hopfield, J. J. (2016). Dense associative memory for pattern recognition. Advances in neural information processing systems, 29.](https://arxiv.org/abs/1606.01164) - **Adaptive Resonance** - [Ellias, S. A., & Grossberg, S. (1975). Pattern formation, contrast control, and oscillations in the short term memory of shunting on-center off-surround networks. Biological Cybernetics, 20(2), 69-98.](https://web.archive.org/web/20230103161030/http://techlab.bu.edu/files/resources/articles_cns/EllGro1975BiolCyb.pdf) - STM — [Grossberg, S., & Grossberg, S. (1982). Contour enhancement, short term memory, and constancies in reverberating neural networks. Studies of Mind and Brain: Neural Principles of Learning, Perception, Development, Cognition, and Motor Control, 332-378.](https://web.archive.org/web/20231230193212/https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=b7dd395303b4d1c241bfec7c3ec3b56294830630) - ART — [Carpenter, G. A., & Grossberg, S. (1987). A massively parallel architecture for a self-organizing neural pattern recognition machine. Computer vision, graphics, and image processing, 37(1), 54-115.](https://web.archive.org/web/20231230191300/https://d1wqtxts1xzle7.cloudfront.net/50945366/s0734-189x_2887_2980014-220161217-29007-b2dl45-libre.pdf?1482048333=&response-content-disposition=inline%3B+filename%3DA_massively_parallel_architecture_for_a.pdf&Expires=1703967123&Signature=b~~pUtnzyzc~ISnFCOxfu37gelaVRdDSD6uyIIOdliH9nPbmTeHoB-XeB0kLDibIhIeGb78LRD2mujneAl7cmKFk7LYHUe0So8EFJRqog5iBinke~CxLy1Ev7qZPIPLkCCVEAP6nknD2-6X7hwGs88ZUwKtMB9kJDYdCohGp8lPgeJbYIXMttvTYMaaVo9y00749zjpaJWcjRU6VMx6uq8IPEXH5hkISjKamBWbT-rYXFPV-IhiEQuqHcTVQdCyNiDVr7JskeO1PoR-WlNEn04u6paKNbNFWpGoCL3sLydGdgOHlSEZkjFC4d24MxZU5WCWp2pAOxsPwumwKK6QW3A__&Key-Pair-Id=APKAJLOHF5GGSLRBV4ZA) — Mentions attention. - [Grossberg, S. (1987). Competitive learning: From interactive activation to adaptive resonance. Cognitive science, 11(1), 23-63.](https://web.archive.org/web/20060907041202/http://www.cns.bu.edu/Profiles/Grossberg/Gro1987CogSci.pdf) - [Carpenter, G. A., Grossberg, S., & Rosen, D. B. (1991). ART 2-A: An adaptive resonance algorithm for rapid category learning and recognition. Neural networks, 4(4), 493-504.](https://web.archive.org/web/20060904212143/http://cns-web.bu.edu/Profiles/Grossberg/CarGro1987AppliedOptics.pdf) - [Carpenter, G. A., & Grossberg, S. (1990). ART 3: Hierarchical search using chemical transmitters in self-organizing pattern recognition architectures. Neural networks, 3(2), 129-152.](https://web.archive.org/web/20060906014656/http://cns.bu.edu/Profiles/Grossberg/CarGro1990NN.pdf) - [Carpenter, G. A., Grossberg, S., & Rosen, D. B. (1991). ART 2-A: An adaptive resonance algorithm for rapid category learning and recognition. Neural networks, 4(4), 493-504.](https://web.archive.org/web/20060519092850/http://cns.bu.edu/Profiles/Grossberg/CarGroRos1991NNART2A.pdf) - [Carpenter, G. A., Grossberg, S., & Reynolds, J. H. (1991). ARTMAP: Supervised real-time learning and classification of nonstationary data by a self-organizing neural network. Neural networks, 4(5), 565-588.](https://web.archive.org/web/20060519091848/http://cns.bu.edu/Profiles/Grossberg/CarGroRey1991NN.pdf) - [Carpenter, G. A., Grossberg, S., & Rosen, D. B. (1991). Fuzzy ART: Fast stable learning and categorization of analog patterns by an adaptive resonance system. Neural networks, 4(6), 759-771.](https://web.archive.org/web/20060519091505/http://cns.bu.edu/Profiles/Grossberg/CarGroRos1991NNFuzzyART.pdf) - [Carpenter, G. A., Grossberg, S., Markuzon, N., Reynolds, J. H., & Rosen, D. B. (1992). Fuzzy ARTMAP: A neural network architecture for incremental supervised learning of analog multidimensional maps. IEEE Transactions on neural networks, 3(5), 698-713.](https://web.archive.org/web/20060519094345/http://cns.bu.edu/Profiles/Grossberg/CarGroMarRey1992IEEETransNN.pdf) - [Tan, A. H. (1995). Adaptive resonance associative map. Neural Networks, 8(3), 437-446.](https://web.archive.org/web/20210812140440/https://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=6227&context=sis_research) - [Williamson, J. R. (1996). Gaussian ARTMAP: A neural network for fast incremental learning of noisy multidimensional maps. Neural networks, 9(5), 881-897.](https://web.archive.org/web/20170903120705/https://open.bu.edu/bitstream/handle/2144/2180/95.003.pdf?sequence=1&isAllowed=y) - [Carpenter, G., & Grossberg, S. (1998). Adaptive resonance theory. Boston University Center for Adaptive Systems and Department of Cognitive and Neural Systems.](https://web.archive.org/web/20060519091948/http://cns.bu.edu/Profiles/Grossberg/CarGro2003HBTNN2.pdf) - [Anagnostopoulos, Georgios C., and M. Georgiopulos. "Hypersphere ART and ARTMAP for unsupervised and supervised, incremental learning." Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium. Vol. 6. IEEE, 2000.](https://web.archive.org/web/20230103160314/http://techlab.bu.edu/files/resources/articles_tt/Anagnostopoulos_Georgiopoulos_2000.pdf) - Applies SOM techniques to ART — [Tan, A. H. (2006, May). Self-organizing neural architecture for reinforcement learning. In International Symposium on Neural Networks (pp. 470-475). Berlin, Heidelberg: Springer Berlin Heidelberg.](https://www.researchgate.net/profile/Ah-Hwee-Tan/publication/220870704_Self-organizing_Neural_Architecture_for_Reinforcement_Learning/links/54b924590cf2c27adc491724/Self-organizing-Neural-Architecture-for-Reinforcement-Learning.pdf) - [Tan, A. H., Carpenter, G. A., & Grossberg, S. (2007, June). Intelligence through interaction: Towards a unified theory for learning. In International Symposium on Neural Networks (pp. 1094-1103). Berlin, Heidelberg: Springer Berlin Heidelberg.](https://web.archive.org/web/20231113063613/https://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=7561&context=sis_research) - [Tscherepanow, M. (2010, September). TopoART: A topology learning hierarchical ART network. In International Conference on Artificial Neural Networks (pp. 157-167). Berlin, Heidelberg: Springer Berlin Heidelberg.](https://web.archive.org/web/20230905002617/https://pub.uni-bielefeld.de/download/1925596/2499061) - [Tscherepanow, M. (2012). Incremental On-line Clustering with a Topology-Learning Hierarchical ART Neural Network Using Hyperspherical Categories. In ICDM (Poster and Industry Proceedings) (pp. 22-34).](https://web.archive.org/web/20220402114215/https://pub.uni-bielefeld.de/download/2498997/2517690/tscherepanow.marko2012incremental-ICDM.pdf) - [Tan, A. H., Subagdja, B., Wang, D., & Meng, L. (2019). Self-organizing neural networks for universal learning and multimodal memory encoding. Neural Networks, 120, 58-73.](https://web.archive.org/web/20231113061819/https://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=6206&context=sis_research) - **Winner-take-all** - Amari, Shun-Ichi, and Michael A. Arbib. "Competition and cooperation in neural nets." Systems neuroscience (1977): 119-165. - [Lazzaro, J., Ryckebusch, S., Mahowald, M. A., & Mead, C. A. (1988). Winner-take-all networks of O (n) complexity. Advances in neural information processing systems, 1.](https://web.archive.org/web/20180729163828/http://www.dtic.mil/dtic/tr/fulltext/u2/a451466.pdf) - Yuille, A. L., & Grzywacz, N. M. (1989). A winner-take-all mechanism based on presynaptic inhibition feedback. Neural Computation, 1(3), 334-347. - [Coultrip, R., Granger, R., & Lynch, G. (1992). A cortical model of winner-take-all competition via lateral inhibition. Neural networks, 5(1), 47-54.](https://www.researchgate.net/profile/Richard-Granger/publication/222066408_A_cortical_model_of_winner-take-all_competition_via_lateral_inhibition/links/5e0f58c7a6fdcc2837550904/A-cortical-model-of-winner-take-all-competition-via-lateral-inhibition.pdf) - [Kaski, S., & Kohonen, T. (1994). Winner-take-all networks for physiological models of competitive learning. Neural Networks, 7(6-7), 973-984.](https://doi.org/10.1016/S0893-6080(05)80154-6) - [Fang, Y., Cohen, M. A., & Kincaid, T. G. (1996). Dynamics of a winner-take-all neural network. Neural Networks, 9(7), 1141-1154.](https://web.archive.org/web/20230512223643/http://www.fang.ece.ufl.edu/mypaper/nn96.pdf) - [Starzyk, J. A., & Jan, Y. W. (1996, August). A voltage based winner takes all circuit for analog neural networks. In Proceedings of the 39th Midwest Symposium on Circuits and Systems (Vol. 1, pp. 501-504). IEEE.](https://www.researchgate.net/profile/Janusz-Starzyk/publication/3690552_A_voltage_based_winner_takes_all_circuit_for_analog_neural_networks/links/004635212186f4b2e4000000/A-voltage-based-winner-takes-all-circuit-for-analog-neural-networks.pdf) - [Maass, W. (2000). On the computational power of winner-take-all. Neural computation, 12(11), 2519-2535.](https://web.archive.org/web/20231230194039/https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=ae4c6cdcbbf1786b41cc6581fcf5eebef9d74986) - [Oster, M., Douglas, R., & Liu, S. C. (2009). Computation with spikes in a winner-take-all network. Neural computation, 21(9), 2437-2465.](https://web.archive.org/web/20181103205805/https://www.zora.uzh.ch/id/eprint/32038/1/neco.2009.07-08-829.pdf) - [Handrich, S., Herzog, A., Wolf, A., & Herrmann, C. S. (2009, September). A biologically plausible winner-takes-all architecture. In International Conference on Intelligent Computing (pp. 315-326). Berlin, Heidelberg: Springer Berlin Heidelberg.](https://doi.org/10.1007/978-3-642-04020-7_34) - [Chen, Y. (2017). Mechanisms of winner-take-all and group selection in neuronal spiking networks. Frontiers in computational neuroscience, 11, 20.](https://doi.org/10.3389/fncom.2017.00020) - [Lynch, N., Musco, C., & Parter, M. (2019). Winner-take-all computation in spiking neural networks. arXiv preprint arXiv:1904.12591.](https://arxiv.org/abs/1904.12591) - **Convolutional Neural Nets** - ReLU — Fukushima, K. (1969). Visual feature extraction by a multilayered network of analog threshold elements. IEEE Transactions on Systems Science and Cybernetics, 5(4), 322-333. - [Fukushima, K. (1980). Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological cybernetics, 36(4), 193-202.](https://www.cs.princeton.edu/courses/archive/spr08/cos598B/Readings/Fukushima1980.pdf) - [Waibel, A., Hanazawa, T., Hinton, G., Shikano, K., & Lang, K. J. (1989). Phoneme recognition using time-delay neural networks. IEEE Transactions on Acoustics, Speech, and Signal Processing, 37(3), 328-339.](https://web.archive.org/web/20230204092552/https://www.inf.ufrgs.br/~engel/data/media/file/cmp121/waibel89_TDNN.pdf) - [Zhang, W., Itoh, K., Tanida, J., & Ichioka, Y. (1990). Parallel distributed processing model with local space-invariant interconnections and its optical architecture. Applied optics, 29(32), 4790-4797.](https://drive.google.com/file/d/0B65v6Wo67Tk5ODRzZmhSR29VeDg/view?resourcekey=0-WZ9_n1lL8vmwKdDTohTbSQ) - **Elman Net** - [Elman, J. L. (1990). Finding structure in time. Cognitive science, 14(2), 179-211.](https://onlinelibrary.wiley.com/doi/abs/10.1207/s15516709cog1402_1) - [Liou, C. Y., Huang, J. C., & Yang, W. C. (2008). Modeling word perception using the Elman network. Neurocomputing, 71(16-18), 3150-3157.](http://ntur.lib.ntu.edu.tw//handle/246246/155195) - [Liou, C. Y., Cheng, W. C., Liou, J. W., & Liou, D. R. (2014). Autoencoder for words. Neurocomputing, 139, 84-96.](https://doi.org/10.1016/j.neucom.2013.09.055) - **[Transfer learning](https://en.wikipedia.org/wiki/Transfer_learning)** - Bozinovski, S., & Fulgosi, A. (1976). The influence of pattern similarity and transfer learning upon training of a base perceptron b2. In Proceedings of Symposium Informatica (Vol. 3, pp. 121-126). - [Bozinovski, S. (1981). Teaching space: A representation concept for adaptive pattern classification. COINS Technical Report No. 81-28.](https://web.archive.org/web/20230307031828/https://web.cs.umass.edu/publication/docs/1981/UM-CS-1981-028.pdf) - [Pratt, L. Y. (1992). Discriminability-based transfer between neural networks. Advances in neural information processing systems, 5.](https://proceedings.neurips.cc/paper/1992/file/67e103b0761e60683e83c559be18d40c-Paper.pdf) - [Pratt, L., & Jennings, B. (1996). A survey of transfer between connectionist networks. Connection Science, 8(2), 163-184.](https://doi.org/10.1080/095400996116866) - [Thrun, S., & Pratt, L. (1998). Learning to learn: Introduction and overview. In Learning to learn (pp. 3-17). Boston, MA: Springer US.](https://www.google.com/books/edition/Learning_to_Learn/X_jpBwAAQBAJ?hl=en) - [Do, C. B., & Ng, A. Y. (2005). Transfer learning for text classification. Advances in neural information processing systems, 18.](https://proceedings.neurips.cc/paper_files/paper/2005/file/bf2fb7d1825a1df3ca308ad0bf48591e-Paper.pdf) - [Yosinski, J., Clune, J., Bengio, Y., & Lipson, H. (2014). How transferable are features in deep neural networks?. Advances in neural information processing systems, 27.](https://arxiv.org/abs/1411.1792) - [Huh, M., Agrawal, P., & Efros, A. A. (2016). What makes ImageNet good for transfer learning?. arXiv preprint arXiv:1608.08614.](https://arxiv.org/abs/1608.08614) - [Bozinovski, S. (2020). Reminder of the first paper on transfer learning in neural networks, 1976. Informatica, 44(3).](https://web.archive.org/web/20230721234607/https://www.informatica.si/index.php/informatica/article/viewFile/2828/1433) - [Zoph, B., Ghiasi, G., Lin, T. Y., Cui, Y., Liu, H., Cubuk, E. D., & Le, Q. (2020). Rethinking pre-training and self-training. Advances in neural information processing systems, 33, 3833-3845.](https://proceedings.neurips.cc/paper/2020/file/27e9661e033a73a6ad8cefcde965c54d-Paper.pdf) - **Backpropagation** - [Linnainmaa, S. (1976). Taylor expansion of the accumulated rounding error. BIT Numerical Mathematics, 16(2), 146-160.](https://web.archive.org/web/20231229080145/https://papers.baulab.info/papers/also/Linnainmaa-1976.pdf) - [Werbos, P. J. (2005, September). Applications of advances in nonlinear sensitivity analysis. In System Modeling and Optimization: Proceedings of the 10th IFIP Conference New York City, USA, August 31–September 4, 1981 (pp. 762-770). Berlin, Heidelberg: Springer Berlin Heidelberg.](https://www.researchgate.net/publication/225785177_Applications_of_advances_in_nonlinear_sensitivity_analysis) - [Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1985). Learning internal representations by error propagation. California Univ San Diego La Jolla Inst for Cognitive Science.](https://apps.dtic.mil/dtic/tr/fulltext/u2/a164453.pdf) - [Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. nature, 323(6088), 533-536.](http://www.cs.toronto.edu/~hinton/absps/naturebp.pdf) - [Pineda, F. (1987). Generalization of back propagation to recurrent and higher order neural networks. In Neural information processing systems.](https://proceedings.neurips.cc/paper/1987/hash/735b90b4568125ed6c3f678819b6e058-Abstract.html) - [Krauth, W., & Mézard, M. (1987). Learning algorithms with optimal stability in neural networks. Journal of Physics A: Mathematical and General, 20(11), L745.](http://www.lptms.u-psud.fr/membres/mezard/Pdf/87_MK_JPA.pdf) - P. W. Munro. A dual back-propagation scheme for scalar reinforcement learning. Proceedings of the Ninth Annual Conference of the Cognitive Science Society, Seattle, WA, pages 165-176, 1987. - M. Mozer. A Focused Backpropagation Algorithm for Temporal Pattern Recognition. Complex Systems, 1989. - [LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., & Jackel, L. D. (1989). Backpropagation applied to handwritten zip code recognition. Neural computation, 1(4), 541-551.](http://yann.lecun.com/exdb/publis/pdf/lecun-89e.pdf) - [Griewank, A. (2012). Who invented the reverse mode of differentiation. Documenta Mathematica, Extra Volume ISMP, 389400.](https://web.archive.org/web/20231016161151/https://ftp.gwdg.de/pub/misc/EMIS/journals/DMJDMV/vol-ismp/52_griewank-andreas-b.pdf) - [Whittington, J. C., & Bogacz, R. (2017). An approximation of the error backpropagation algorithm in a predictive coding network with local hebbian synaptic plasticity. Neural computation, 29(5), 1229-1262.](https://doi.org/10.1162/NECO_a_00949) - [Song, Y., Lukasiewicz, T., Xu, Z., & Bogacz, R. (2020). Can the Brain Do Backpropagation?---Exact Implementation of Backpropagation in Predictive Coding Networks. Advances in neural information processing systems, 33, 22566-22579.](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7610561/) - [Rosenbaum, R. (2022). On the relationship between predictive coding and backpropagation. Plos one, 17(3), e0266102.](https://doi.org/10.1371/journal.pone.0266102) - [Millidge, B., Tschantz, A., & Buckley, C. L. (2022). Predictive coding approximates backprop along arbitrary computation graphs. Neural Computation, 34(6), 1329-1368.](https://doi.org/10.1162/neco_a_01497) - **Tensor Networks** - [Pellionisz, A., & Llinás, R. (1980). Tensorial approach to the geometry of brain function: Cerebellar coordination via a metric tensor. Neuroscience, 5(7), 1125-1136.](https://doi.org/10.1016/0306-4522(80)90191-8) - [Pellionisz, A., & Llinás, R. (1982). Tensor theory of brain function. The cerebellum as a space-time metric. In Competition and cooperation in neural nets (pp. 394-417). Springer, Berlin, Heidelberg.](https://doi.org/10.1007/978-3-642-46466-9_23) - [Pellionisz, A. J. (1985). Robotics connected to neurobiology by tensor theory of brain function. In IEEE Proceedings of the International Conference on Cybernetics and Society (pp. 411-414).](http://www.junkdna.com/1985_ieee_proc_systems_man_cybernetics.pdf) - [Pellionisz, A., & Llinas, R. (1985). Tensor network theory of the metaorganization of functional geometries in the central nervous system. Neuroscience, 16(2), 245-273.](https://doi.org/10.1016/0306-4522(85)90001-6) - [Pellionisz, A. J. (1986). Tensor network theory of the central nervous system and sensorimotor modeling. In Brain theory (pp. 121-145). Springer, Berlin, Heidelberg.](https://doi.org/10.1007/978-3-642-70911-1_8) - [Pellionisz, A. (1988). Tensorial aspects of the multidimensional massively parallel sensorimotor function of neuronal networks. Progress in brain research, 76, 341-354.](https://doi.org/10.1016/S0079-6123(08)64521-5) - [Pellionisz, A. (1989). Tensor geometry: A language of brains & neurocomputers. Generalized coordinates in neuroscience & robotics. In Neural Computers (pp. 381-391). Springer, Berlin, Heidelberg.](https://doi.org/10.1007/978-3-642-83740-1_39) - [Pellionisz, A. (1989). Tensor Network Model of the Cerebellum and its Olivary System. The Olivo-Cerebellar System in Motor Control, 400-424.](http://www.junkdna.com/1989_strata_olive_book_springer.pdf) - [Lv, Z., Luo, S., Liu, Y., & Zheng, Y. (2006, August). Information geometry approach to the model selection of neural networks. In First International Conference on Innovative Computing, Information and Control-Volume I (ICICIC'06) (Vol. 3, pp. 419-422). IEEE.](https://doi.org/10.1109/ICICIC.2006.463) - [Socher, R., Chen, D., Manning, C. D., & Ng, A. (2013). Reasoning with neural tensor networks for knowledge base completion. Advances in neural information processing systems, 26.](https://dl.acm.org/doi/10.5555/2999611.2999715) - [Orús, R. (2014). A practical introduction to tensor networks: Matrix product states and projected entangled pair states. Annals of physics, 349, 117-158.](https://doi.org/10.1016/j.aop.2014.06.013) - [Orús, R. (2014). Advances on tensor network theory: symmetries, fermions, entanglement, and holography. The European Physical Journal B, 87(11), 1-18.](https://doi.org/10.1140/epjb/e2014-50502-9) - [Phien, H. N., McCulloch, I. P., & Vidal, G. (2015). Fast convergence of imaginary time evolution tensor network algorithms by recycling the environment. Physical Review B, 91(11), 115137.](https://doi.org/10.1103/PhysRevB.91.115137) - [Evenbly, G., & Vidal, G. (2015). Tensor network renormalization. Physical review letters, 115(18), 180405.](https://doi.org/10.1103/PhysRevLett.115.180405) - [Czech, B., Lamprou, L., McCandlish, S., & Sully, J. (2015). Integral geometry and holography. Journal of High Energy Physics, 2015(10), 1-41.](https://doi.org/10.1007/JHEP10(2015)175) - [Sully, J. (2015) Geometry from Compression. Perimeter Institute Recorded Seminar Archive. 3 Feb.](https://pirsa.org/15020080) - It would be interesting to see if what deep connections exist between this and [information geometry](https://en.wikipedia.org/wiki/Information_geometry). - [Czech, B., Lamprou, L., McCandlish, S., & Sully, J. (2016). Tensor networks from kinematic space. Journal of High Energy Physics, 2016(7), 1-38.](https://doi.org/10.1007/JHEP07(2016)100) - [Gan, W. C., & Shu, F. W. (2017). Holography as deep learning. International Journal of Modern Physics D, 26(12), 1743020.](https://doi.org/10.1142/S0218271817430209) - [Chirco, G., Oriti, D., & Zhang, M. (2018). Group field theory and tensor networks: towards a Ryu–Takayanagi formula in full quantum gravity. Classical and Quantum Gravity, 35(11), 115011.](https://doi.org/10.1088/1361-6382/aabf55) - [Ganchev, A. (2019, February). On bulk/boundary duality and deep networks. In AIP Conference Proceedings (Vol. 2075, No. 1, p. 100002). AIP Publishing LLC.](https://doi.org/10.1063/1.5091246) - [Zhang, Q., Guo, B., Kong, W., Xi, X., Zhou, Y., & Gao, F. (2021). Tensor-based dynamic brain functional network for motor imagery classification. Biomedical Signal Processing and Control, 69, 102940.](https://doi.org/10.1016/j.bspc.2021.102940) - [Kobayashi, M. (2021). Information geometry of hyperbolic-valued Boltzmann machines. Neurocomputing, 431, 163-168.](https://doi.org/10.1016/j.neucom.2020.12.048) - [Howard, E. (2021). Holographic renormalization with machine learning. In Emerging Technologies in Data Mining and Information Security (pp. 253-261). Springer, Singapore.](https://doi.org/10.1007/978-981-15-9774-9_24) - [Park, C., Hwang, C. O., Cho, K., & Kim, S. J. (2022). Dual Geometry of Entanglement Entropy via Deep Learning. arXiv preprint arXiv:2205.04445.](https://doi.org/10.48550/arXiv.2205.04445) - [Gesteau, E., Marcolli, M., & Parikh, S. (2022). Holographic tensor networks from hyperbolic buildings. arXiv preprint arXiv:2202.01788.](https://doi.org/10.48550/arXiv.2202.01788) - **Autoencoders** - [Oja, E. (1982). Simplified neuron model as a principal component analyzer. Journal of mathematical biology, 15, 267-273.](http://jeti.uni-freiburg.de/studenten_seminar/term_paper_WS_16_17/Oja.pdf) - [Bourlard, H., & Kamp, Y. (1988). Auto-association by multilayer perceptrons and singular value decomposition. Biological cybernetics, 59(4-5), 291-294.](https://www.researchgate.net/profile/Herve-Bourlard/publication/19959069_Auto-Association_by_Multilayer_Perceptrons_and_Singular_Value_Decomposition/links/57600aaa08aeeada5bc2b4cc/Auto-Association-by-Multilayer-Perceptrons-and-Singular-Value-Decomposition.pdf) - [Kramer, M. A. (1991). Nonlinear principal component analysis using autoassociative neural networks. AIChE journal, 37(2), 233-243.](https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=87c280d0dc204ca5db0d325991a21c211aeec866) - [Japkowicz, N., Hanson, S. J., & Gluck, M. A. (2000). Nonlinear autoassociation is not equivalent to PCA. Neural computation, 12(3), 531-545.](https://direct.mit.edu/neco/article-abstract/12/3/531/6350/Nonlinear-Autoassociation-Is-Not-Equivalent-to-PCA?redirectedFrom=fulltext) - [Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. science, 313(5786), 504-507.](https://www.science.org/doi/10.1126/science.1127647) - [Hinton, G. E., Krizhevsky, A., & Wang, S. D. (2011). Transforming auto-encoders. In Artificial Neural Networks and Machine Learning–ICANN 2011: 21st International Conference on Artificial Neural Networks, Espoo, Finland, June 14-17, 2011, Proceedings, Part I 21 (pp. 44-51). Springer Berlin Heidelberg.](https://web.archive.org/web/20181219212445/https://www.cs.toronto.edu/~fritz/absps/transauto6.pdf) - [Chicco, D., Sadowski, P., & Baldi, P. (2014, September). Deep autoencoder neural networks for gene ontology annotation predictions. In Proceedings of the 5th ACM conference on bioinformatics, computational biology, and health informatics (pp. 533-540).](https://dl.acm.org/doi/10.1145/2649387.2649442) - [Plaut, E. (2018). From principal subspaces to principal components with linear autoencoders. arXiv preprint arXiv:1804.10253.](https://arxiv.org/abs/1804.10253) - [Kingma, D. P., & Welling, M. (2019). An introduction to variational autoencoders. Foundations and Trends® in Machine Learning, 12(4), 307-392.](https://arxiv.org/abs/1906.02691) - [Ackley, D. H., Hinton, G. E., & Sejnowski, T. J. (1985). A learning algorithm for Boltzmann machines. Cognitive science, 9(1), 147-169.](https://www.cs.toronto.edu/~fritz/absps/cogscibm.pdf) - [Krauth, W., & Mézard, M. (1987). Learning algorithms with optimal stability in neural networks. Journal of Physics A: Mathematical and General, 20(11), L745.](https://www.researchgate.net/profile/Marc-Mezard/publication/230920580_Learning_algorithms_with_optimal_stability_in_neural_networks/links/09e4151439966bc5a8000000/Learning-algorithms-with-optimal-stability-in-neural-networks.pdf) - [Grossberg, S. (1988). Nonlinear neural networks: Principles, mechanisms, and architectures. Neural networks, 1(1), 17-61.](https://web.archive.org/web/20231228180021/https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=8ae6f957b6a4615ade96cb61a6d6b3e6083cadbd) - **BAM** — [Kosko, B. (1988). Bidirectional associative memories. IEEE Transactions on Systems, man, and Cybernetics, 18(1), 49-60.](https://ieeexplore.ieee.org/document/87054) - **Gradient Descent** - [Lewis, J. P. (1988, July). Creation by refinement: a creativity paradigm for gradient descent learning networks. In ICNN (pp. 229-233).](https://ieeexplore.ieee.org/document/23933) - [Hanson, S. J. (1990). A stochastic version of the delta rule. Physica D: Nonlinear Phenomena, 42(1-3), 265-272.](https://doi.org/10.1016/0167-2789(90)90081-Y) - [LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324.](https://web.archive.org/web/20180221193253/yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf) - [Amari, S. I., & Douglas, S. C. (1998, May). Why natural gradient?. In Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP'98 (Cat. No. 98CH36181) (Vol. 2, pp. 1213-1216). IEEE.](https://web.archive.org/web/20240104155350/https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=4a165d81336028cb257191540109178adf275fea) — applies learnings from Information Geometry to gradient descent. - [Hochreiter, S., Younger, A. S., & Conwell, P. R. (2001). Learning to learn using gradient descent. In Artificial Neural Networks---ICANN 2001: International Conference Vienna, Austria, August 21--25, 2001 Proceedings 11 (pp. 87-94). Springer Berlin Heidelberg.](http://www.bioinf.jku.at/publications/older/1504.pdf) - [Du, S. S., Zhai, X., Poczos, B., & Singh, A. (2018). Gradient descent provably optimizes over-parameterized neural networks. arXiv preprint arXiv:1810.02054.](https://arxiv.org/abs/1810.02054) - [Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function. Mathematics of control, signals and systems, 2(4), 303-314.](https://hal.science/hal-03753170/document) - **Networks training networks** - [J. Schmidhuber. Networks adjusting networks. In J. Kindermann and A. Linden, editors, Proceedings of `Distributed Adaptive Neural Information Processing', St.Augustin, 24.-25.5. 1989, pages 197-208. Oldenbourg, 1990. Extended version: TR FKI-125-90 (revised), Institut für Informatik, TUM.](https://web.archive.org/web/20231219143905/https://people.idsia.ch/~juergen/FKI-125-90ocr.pdf) - J. Schmidhuber. Additional remarks on G. Lukes' review of Schmidhuber's paper `Recurrent networks adjusted by adaptive critics'. Neural Network Reviews, 4(1):43, 1990. - [Kinzel, W., & Rujan, P. (1990). Improving a network generalization ability by selecting examples. Europhysics Letters, 13(5), 473.](https://iopscience.iop.org/article/10.1209/0295-5075/13/5/016/meta) - [Hinton, G. E. (1990). Connectionist learning procedures. In Machine learning (pp. 555-610). Morgan Kaufmann.](https://web.archive.org/web/20220616164359/https://files.eric.ed.gov/fulltext/ED294889.pdf) — Provides a pretty good summary of previous work up to this point. Doesn't mention everything though. - **Renormalization/Information Bottleneck** - [Equitz, W. H., & Cover, T. M. (1991). Successive refinement of information. IEEE Transactions on Information Theory, 37(2), 269-275.](https://web.archive.org/web/20221222072855/https://www-isl.stanford.edu/people/cover/papers/transIT/0269equi.pdf) - [Kramer, M. A. (1992). Autoassociative neural networks. Computers & chemical engineering, 16(4), 313-328.](https://doi.org/10.1016/0098-1354(92)80051-A) - [Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992, July). A training algorithm for optimal margin classifiers. In Proceedings of the fifth annual workshop on Computational learning theory (pp. 144-152).](https://dl.acm.org/doi/10.1145/130385.130401) - [Tishby, N., Pereira, F. C., & Bialek, W. (2000). The information bottleneck method. arXiv preprint physics/0004057.](https://arxiv.org/abs/physics/0004057) - [Gilad-Bachrach, R., Navot, A., & Tishby, N. (2003). An information theoretic tradeoff between complexity and accuracy. In Learning Theory and Kernel Machines: 16th Annual Conference on Learning Theory and 7th Kernel Workshop, COLT/Kernel 2003, Washington, DC, USA, August 24-27, 2003. Proceedings (pp. 595-609). Springer Berlin Heidelberg.](https://www.researchgate.net/profile/Naftali-Tishby/publication/2831344_An_Information_Theoretic_Tradeoff_between_Complexity_and_Accuracy/links/0c96051816ace453e2000000/An-Information-Theoretic-Tradeoff-between-Complexity-and-Accuracy.pdf) - [Shamir, O., Sabato, S., & Tishby, N. (2010). Learning and generalization with the information bottleneck. Theoretical Computer Science, 411(29-30), 2696-2711.](https://doi.org/10.1016/j.tcs.2010.04.006) - [Neyshabur, B., Tomioka, R., & Srebro, N. (2014). In search of the real inductive bias: On the role of implicit regularization in deep learning. arXiv preprint arXiv:1412.6614.](https://arxiv.org/abs/1412.6614) - [Mehta, P., & Schwab, D. J. (2014). An exact mapping between the variational renormalization group and deep learning. arXiv preprint arXiv:1410.3831.](https://arxiv.org/abs/1410.3831) - [Tishby, N., & Zaslavsky, N. (2015, April). Deep learning and the information bottleneck principle. In 2015 ieee information theory workshop (itw) (pp. 1-5). IEEE.](https://arxiv.org/abs/1503.02406) - [Alemi, A. A., Fischer, I., Dillon, J. V., & Murphy, K. (2016). Deep variational information bottleneck. arXiv preprint arXiv:1612.00410.](https://arxiv.org/abs/1612.00410) - [Shwartz-Ziv, R., & Tishby, N. (2017). Opening the black box of deep neural networks via information. arXiv preprint arXiv:1703.00810.](https://arxiv.org/abs/1703.00810) - [Saxe, A. M., Bansal, Y., Dapello, J., Advani, M., Kolchinsky, A., Tracey, B. D., & Cox, D. D. (2019). On the information bottleneck theory of deep learning. Journal of Statistical Mechanics: Theory and Experiment, 2019(12), 124020.](https://artemyk.github.io/assets/pdf/papers/Saxe%20et%20al_2019_On%20the%20information%20bottleneck%20theory%20of%20deep%20learning.pdf) - [Zhang, C., Bengio, S., Hardt, M., Recht, B., & Vinyals, O. (2021). Understanding deep learning (still) requires rethinking generalization. Communications of the ACM, 64(3), 107-115.](https://dl.acm.org/doi/abs/10.1145/3446776) - **Continuous-time Networks** - [Funahashi, K. I., & Nakamura, Y. (1993). Approximation of dynamical systems by continuous time recurrent neural networks. Neural networks, 6(6), 801-806.](https://www.sciencedirect.com/science/article/abs/pii/S089360800580125X) - [Williamson, M. M. (1998). Neural control of rhythmic arm movements. Neural networks, 11(7-8), 1379-1394.](https://groups.csail.mit.edu/lbr/hrg/1998/nn-journal.pdf) - [Williamson, M. M. (1998, October). Rhythmic robot arm control using oscillators. In Proceedings. 1998 IEEE/RSJ International Conference on Intelligent Robots and Systems. Innovations in Theory, Practice and Applications (Cat. No. 98CH36190) (Vol. 1, pp. 77-83). IEEE.](https://groups.csail.mit.edu/lbr/hrg/1998/mattw-iros98.pdf) - [Zehr, E. P., Balter, J. E., Ferris, D. P., Hundza, S. R., Loadman, P. M., & Stoloff, R. H. (2007). Neural regulation of rhythmic arm and leg movement is conserved across human locomotor tasks. The Journal of physiology, 582(1), 209-227.](https://physoc.onlinelibrary.wiley.com/doi/abs/10.1113/jphysiol.2007.133843) - [Zehr, E. P. (2005). Neural control of rhythmic human movement: the common core hypothesis. Exercise and sport sciences reviews, 33(1), 54-60.](https://journals.lww.com/acsm-essr/Fulltext/2005/01000/Neural_Control_of_Rhythmic_Human_Movement__The.10.aspx) - [Hasani, R. M., Haerle, D., & Grosu, R. (2016, June). Efficient modeling of complex analog integrated circuits using neural networks. In 2016 12th Conference on Ph. D. Research in Microelectronics and Electronics (PRIME) (pp. 1-4). IEEE.](https://ti.tuwien.ac.at/cps/people/grosu/files/prime16.pdf) - [Mozer, M. C., Kazakov, D., & Lindsey, R. V. (2017). Discrete event, continuous time rnns. arXiv preprint arXiv:1710.04110.](https://arxiv.org/abs/1710.04110) - [Gleeson, P., Lung, D., Grosu, R., Hasani, R., & Larson, S. D. (2018). c302: a multiscale framework for modelling the nervous system of Caenorhabditis elegans. Philosophical Transactions of the Royal Society B: Biological Sciences, 373(1758), 20170379.](https://royalsocietypublishing.org/doi/full/10.1098/rstb.2017.0379) - [Gu, A., Dao, T., Ermon, S., Rudra, A., & Ré, C. (2020). Hippo: Recurrent memory with optimal polynomial projections. Advances in neural information processing systems, 33, 1474-1487.](https://proceedings.neurips.cc/paper/2020/file/102f0bb6efb3a6128a3c750dd16729be-Paper.pdf) - [Lechner, M., Hasani, R., Amini, A., Henzinger, T. A., Rus, D., & Grosu, R. (2020). Neural circuit policies enabling auditable autonomy. Nature Machine Intelligence, 2(10), 642-652.](https://web.archive.org/web/20211124185521/https://publik.tuwien.ac.at/files/publik_292280.pdf) - [Hasani, R., Lechner, M., Amini, A., Rus, D., & Grosu, R. (2021). Liquid Time-constant Networks. Proceedings of the AAAI Conference on Artificial Intelligence, 35(9), 7657-7666.](https://doi.org/10.1609/aaai.v35i9.16936) — [preprint](https://arxiv.org/abs/2006.04439) - [Vorbach, C., Hasani, R., Amini, A., Lechner, M., & Rus, D. (2021). Causal navigation by continuous-time neural networks. Advances in Neural Information Processing Systems, 34, 12425-12440.](https://proceedings.neurips.cc/paper/2021/file/67ba02d73c54f0b83c05507b7fb7267f-Paper.pdf) - [Hasani, R., Lechner, M., Amini, A., Liebenwein, L., Ray, A., Tschaikowski, M., ... & Rus, D. (2022). Closed-form continuous-time neural networks. Nature Machine Intelligence, 1-12.](https://www.nature.com/articles/s42256-022-00556-7) - [Hasani, R., Lechner, M., Wang, T. H., Chahine, M., Amini, A., & Rus, D. (2022). Liquid structural state-space models. arXiv preprint arXiv:2209.12951.](https://arxiv.org/pdf/2209.12951.pdf) - [Balcázar, J. L., Gavalda, R., & Siegelmann, H. T. (1997). Computational power of neural networks: A characterization in terms of Kolmogorov complexity. IEEE Transactions on Information Theory, 43(4), 1175-1183.](https://people.cs.umass.edu/~binds/papers/1997_Balcazar_IEEETransInfoTheory.pdf) - **LSTM** - [Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.](https://www.researchgate.net/publication/13853244_Long_Short-term_Memory) - [Gers, F. A., Schmidhuber, J., & Cummins, F. (2000). Learning to forget: Continual prediction with LSTM. Neural computation, 12(10), 2451-2471.](https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=11540131eae85b2e11d53df7f1360eeb6476e7f4) - [Sak, H., Senior, A. W., & Beaufays, F. (2014). Long short-term memory recurrent neural network architectures for large scale acoustic modeling.](https://web.archive.org/web/20180424203806/https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43905.pdf) - [Li, X., & Wu, X. (2015, April). Constructing long short-term memory based deep recurrent neural networks for large vocabulary speech recognition. In 2015 ieee international conference on acoustics, speech and signal processing (icassp) (pp. 4520-4524). IEEE.](https://arxiv.org/abs/1410.4281) - [Greff, K., Srivastava, R. K., Koutník, J., Steunebrink, B. R., & Schmidhuber, J. (2015). LSTM: a search space odyssey. CoRR. arXiv preprint arXiv:1503.04069.](https://arxiv.org/abs/1503.04069) - [Malhotra, P., Vig, L., Shroff, G., & Agarwal, P. (2015, April). Long Short Term Memory Networks for Anomaly Detection in Time Series. In ESANN (Vol. 2015, p. 89).](https://web.archive.org/web/20200830034708/https://www.elen.ucl.ac.be/Proceedings/esann/esannpdf/es2015-56.pdf) - **Deep Neural Networks** - [Hinton, G. E. (2007). Learning multiple layers of representation. Trends in cognitive sciences, 11(10), 428-434.](http://www.cs.toronto.edu/~hinton/absps/tics.pdf) - [Bengio, Y. (2009). Learning deep architectures for AI. Foundations and trends® in Machine Learning, 2(1), 1-127.](https://web.archive.org/web/20230720043840/https://www.iro.umontreal.ca/~lisa/pointeurs/TR1312.pdf) - [LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. nature, 521(7553), 436-444.](https://neuron.eng.wayne.edu/auth/ece512/Deep_Learning_Hinton.pdf) - [Schmidhuber, J. (2020). Deep learning: our miraculous year 1990-1991. arXiv preprint arXiv:2005.05744.](https://arxiv.org/abs/2005.05744) - [Yosinski, J., Clune, J., Bengio, Y., & Lipson, H. (2014). How transferable are features in deep neural networks?. Advances in neural information processing systems, 27.](https://proceedings.neurips.cc/paper_files/paper/2014/hash/375c71349b295fbe2dcdca9206f20a06-Abstract.html) - [Nguyen, A., Yosinski, J., & Clune, J. (2015). Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 427-436).](https://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Nguyen_Deep_Neural_Networks_2015_CVPR_paper.pdf) - [Jentzen, A., Kuckuck, B., & von Wurstemberger, P. (2023). Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory. arXiv preprint arXiv:2310.20360.](https://arxiv.org/abs/2310.20360) - **Deep Recurrent Neural Networks** - CTRNN — [Yamashita, Y., & Tani, J. (2008). Emergence of functional hierarchy in a multiple timescale neural network model: a humanoid robot experiment. PLoS computational biology, 4(11), e1000220.](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2570613/) - **Deep Convolutional Neural Networks** - [Ciregan, D., Meier, U., & Schmidhuber, J. (2012, June). Multi-column deep neural networks for image classification. In 2012 IEEE conference on computer vision and pattern recognition (pp. 3642-3649). IEEE.](https://arxiv.org/pdf/1202.2745.pdf) - [Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25.](https://proceedings.neurips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf) - [Maitra, D. S., Bhattacharya, U., & Parui, S. K. (2015, August). CNN based common approach to handwritten character recognition of multiple scripts. In 2015 13th International Conference on Document Analysis and Recognition (ICDAR) (pp. 1021-1025). IEEE.](https://ieeexplore.ieee.org/document/7333916) - [Zhang, R. (2019, May). Making convolutional networks shift-invariant again. In International conference on machine learning (pp. 7324-7334). PMLR.](http://proceedings.mlr.press/v97/zhang19a/zhang19a.pdf) - [Mouton, C., Myburgh, J. C., & Davel, M. H. (2020). Stride and translation invariance in CNNs. In Artificial Intelligence Research: First Southern African Conference for AI Research, SACAIR 2020, Muldersdrift, South Africa, February 22-26, 2021, Proceedings 1 (pp. 267-281). Springer International Publishing.](https://arxiv.org/pdf/2103.10097.pdf) - [Stelzer, F., Röhm, A., Vicente, R., Fischer, I., & Yanchuk, S. (2021). Deep neural networks using a single neuron: folded-in-time architecture using feedback-modulated delay loops. Nature communications, 12(1), 5164.](https://www.nature.com/articles/s41467-021-25427-4) - **Hyperdimensional Computing** - [Kanerva, P. (2009). Hyperdimensional computing: An introduction to computing in distributed representation with high-dimensional random vectors. Cognitive computation, 1, 139-159.](https://cs.uwaterloo.ca/~jhoey/teaching/cogsci600/papers/Kanerva09.pdf) - Based on Distributed Memories, Holographic Reduced Representation, Spatter Code, Semantic Vectors, Latent Symantic Analysis, Context-dependent Thinning, Vector Symbolic Architecture, and a few other ideas. - [Smolensky, P. (1990). Tensor product variable binding and the representation of symbolic structures in connectionist systems. Artificial intelligence, 46(1-2), 159-216.](http://www.lscp.net/persons/dupoux/teaching/AT1_2012/papers/Smolensky_1990_TensorProductVariableBinding.AI.pdf) - [Plate, T. A. (1995). Holographic reduced representations. IEEE Transactions on Neural networks, 6(3), 623-641.](https://web.archive.org/web/20231230170300/https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=645698cdb52b0f1fdcac55da91e56f7ffd935d15) - Kanerva, P. (1994). The spatter code for encoding concepts at many levels. In ICANN’94: Proceedings of the International Conference on Artificial Neural Networks Sorrento, Italy, 26–29 May 1994 Volume 1, Parts 1 and 2 4 (pp. 226-229). Springer London. - Gayler, R. W. (1998). Multiplicative binding, representation operators & analogy (workshop poster). - [Rachkovskij, D. A., & Kussul, E. M. (2001). Binding and normalization of binary sparse distributed representations by context-dependent thinning. Neural Computation, 13(2), 411-452.](https://web.archive.org/web/20231230171234/https://core.ac.uk/download/pdf/207530132.pdf) - [Gayler, R. W. (2004). Vector symbolic architectures answer Jackendoff's challenges for cognitive neuroscience. arXiv preprint cs/0412059.](https://arxiv.org/abs/cs/0412059) - [Aerts, D., & Czachor, M. (2008). Tensor-product versus geometric-product coding. Physical Review A, 77(1), 012316.](https://arxiv.org/pdf/0709.1268.pdf) - See full bibliography at https://view-awesome-table.com/-M9Q2TlY5jMY3xXFfNHl/view - [Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580.](https://arxiv.org/abs/1207.0580) - **seq2seq** - [Graves, A. (2012). Sequence transduction with recurrent neural networks. arXiv preprint arXiv:1211.3711.](https://doi.org/10.48550/arXiv.1211.3711) - [Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. Advances in neural information processing systems, 27.](https://proceedings.neurips.cc/paper/2014/hash/a14ac55a4f27472c5d894ec1c3c743d2-Abstract.html) - [Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.](https://doi.org/10.48550/arXiv.1409.0473) - **GAN** — [Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2014). Generative adversarial nets. Advances in neural information processing systems, 27.](https://proceedings.neurips.cc/paper/2014/file/5ca3e9b122f61f8f06494c97b1afccf3-Paper.pdf) - **Distilation** - [Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531.](https://arxiv.org/abs/1503.02531) - [Anil, R., Pereyra, G., Passos, A., Ormandi, R., Dahl, G. E., & Hinton, G. E. (2018). Large scale distributed neural network training through online distillation. arXiv preprint arXiv:1804.03235.](https://arxiv.org/abs/1804.03235) - **Diffusion** - [Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., & Ganguli, S. (2015, June). Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning (pp. 2256-2265). PMLR.](https://arxiv.org/abs/1503.03585) - [Song, Y., & Ermon, S. (2019). Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems, 32.](https://arxiv.org/abs/1907.05600) - [Ho, J., Jain, A., & Abbeel, P. (2020). Denoising diffusion probabilistic models. Advances in neural information processing systems, 33, 6840-6851.](https://arxiv.org/abs/2006.11239) - [Rakkiyappan, R., Chandrasekar, A., Lakshmanan, S., & Park, J. H. (2015). Exponential stability for markovian jumping stochastic BAM neural networks with mode‐dependent probabilistic time‐varying delays and impulse control. Complexity, 20(3), 39-65.](https://onlinelibrary.wiley.com/doi/10.1002/cplx.21503) - [Soudry, D., Di Castro, D., Gal, A., Kolodny, A., & Kvatinsky, S. (2015). Memristor-based multilayer neural networks with online gradient descent training. IEEE transactions on neural networks and learning systems, 26(10), 2408-2421.](https://web.archive.org/web/20160916163309/http://webee.technion.ac.il/people/skva/TNNLS.pdf) - [Widrow, B., Greenblatt, A., Kim, Y., & Park, D. (2013). The no-prop algorithm: A new learning algorithm for multilayer neural networks. Neural Networks, 37, 182-188.](https://web.archive.org/web/20230607025139/https://www-isl.stanford.edu/~widrow/papers/131.no_prop_neural_networks.pdf) - [Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1), 1929-1958.](https://www.jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf) - [Demircigil, M., Heusel, J., Löwe, M., Upgang, S., & Vermet, F. (2017). On a model of associative memory with huge storage capacity. Journal of Statistical Physics, 168, 288-299.](https://link.springer.com/article/10.1007/s10955-017-1806-y) - [Sabour, S., Frosst, N., & Hinton, G. E. (2017). Dynamic routing between capsules. Advances in neural information processing systems, 30.](https://proceedings.neurips.cc/paper/2017/file/2cad8fa47bbef282badbb8de5374b894-Paper.pdf) - [Li, Y., Gimeno, F., Kohli, P., & Vinyals, O. (2020). Strong generalization and efficiency in neural programs. arXiv preprint arXiv:2007.03629.](https://arxiv.org/pdf/2007.03629.pdf) - [Bubeck, S., & Sellke, M. (2021). A universal law of robustness via isoperimetry. Advances in Neural Information Processing Systems, 34, 28811-28822.](https://arxiv.org/abs/2105.12806) - [Barnett, A. J., Guo, Z., Jing, J., Ge, W., Rudin, C., & Westover, M. B. (2022). An Interpretable Machine Learning System to Identify EEG Patterns on the Ictal-Interictal-Injury Continuum. arXiv preprint arXiv:2211.05207.](https://arxiv.org/abs/2211.05207) - [Song, Y., Millidge, B., Salvatori, T., Lukasiewicz, T., Xu, Z., & Bogacz, R. (2024). Inferring neural activity before plasticity as a foundation for learning beyond backpropagation. Nature Neuroscience, 1-11.](https://doi.org/10.1038/s41593-023-01514-1) ## Semantic Word Embeddings and Large Language Models See [Semantic Word Embeddings / Word Vectors](/2IBLoQz4SsGougvgimR93w) ## Voice recognition and generation - [Hunt, A. J., & Black, A. W. (1996, May). Unit selection in a concatenative speech synthesis system using a large speech database. In 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings (Vol. 1, pp. 373-376). IEEE.](https://www.ee.columbia.edu/~dpwe/e6820/papers/HuntB96-speechsynth.pdf) - [Black, A. W., Zen, H., & Tokuda, K. (2007, April). Statistical parametric speech synthesis. In 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP'07 (Vol. 4, pp. IV-1229). IEEE.](https://citeseerx.ist.psu.edu/doc/10.1.1.154.9874) - [Loots, L. (2010). Data-driven augmentation of pronunciation dictionaries (Doctoral dissertation, Stellenbosch: University of Stellenbosch).](https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.832.2872&rep=rep1&type=pdf) - [Oord, A. V. D., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., ... & Kavukcuoglu, K. (2016). Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499.](https://arxiv.org/abs/1609.03499) - [Li, Y., Babuschkin, I., Simonyan, K., Vinyals, O., Kavukcuoglu, K., Lockhart, E., ... & Hassabis, D. (2017). Parallel WaveNet: Fast High-Fidelity Speech Synthesis. arXiv preprint arXiv:1711.10433.](https://arxiv.org/abs/1711.10433) - [Shen, J., Pang, R., Weiss, R. J., Schuster, M., Jaitly, N., Yang, Z., ... & Wu, Y. (2017). Natural TTS synthesis by conditioning WaveNet on Mel spectrogram predictions. CoRR abs/1712.05884 (2017). arXiv preprint arXiv:1712.05884.](https://arxiv.org/abs/1712.05884) - [Chen, Y., Assael, Y., Shillingford, B., Budden, D., Reed, S., Zen, H., ... & de Freitas, N. (2018). Sample efficient adaptive text-to-speech. arXiv preprint arXiv:1809.10460.](https://arxiv.org/abs/1809.10460v1) - [Hsu, W. N., Zhang, Y., Weiss, R. J., Zen, H., Wu, Y., Wang, Y., ... & Pang, R. (2018). Hierarchical generative modeling for controllable speech synthesis. arXiv preprint arXiv:1810.07217.](https://arxiv.org/abs/1810.07217) - [Li, Y., & Mandt, S. (2018). Disentangled sequential autoencoder. arXiv preprint arXiv:1803.02991.](https://arxiv.org/abs/1803.02991) - [Chorowski, J., Weiss, R. J., Bengio, S., & Van Den Oord, A. (2019). Unsupervised speech representation learning using wavenet autoencoders. IEEE/ACM transactions on audio, speech, and language processing, 27(12), 2041-2053.](https://arxiv.org/abs/1901.08810) - [Bińkowski, M., Donahue, J., Dieleman, S., Clark, A., Elsen, E., Casagrande, N., ... & Simonyan, K. (2019). High fidelity speech synthesis with adversarial networks. arXiv preprint arXiv:1909.11646.](https://arxiv.org/abs/1909.11646) - [Habib, R., Mariooryad, S., Shannon, M., Battenberg, E., Skerry-Ryan, R. J., Stanton, D., ... & Bagby, T. (2019). Semi-supervised generative modeling for controllable speech synthesis. arXiv preprint arXiv:1910.01709.](https://arxiv.org/abs/1910.01709) - [Chung, Y. A., Wang, Y., Hsu, W. N., Zhang, Y., & Skerry-Ryan, R. J. (2019, May). Semi-supervised training for improving data efficiency in end-to-end speech synthesis. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6940-6944). IEEE.](https://arxiv.org/abs/1808.10128) - [Ren, Y., Tan, X., Qin, T., Zhao, S., Zhao, Z., & Liu, T. Y. (2019, May). Almost unsupervised text to speech and automatic speech recognition. In International conference on machine learning (pp. 5410-5419). PMLR.](https://arxiv.org/abs/1905.06791) - [Kong, J., Kim, J., & Bae, J. (2020). Hifi-gan: Generative adversarial networks for efficient and high fidelity speech synthesis. Advances in Neural Information Processing Systems, 33, 17022-17033.](https://arxiv.org/abs/2010.05646v2) - [Gong, Y., Chung, Y. A., & Glass, J. (2021). Ast: Audio spectrogram transformer. arXiv preprint arXiv:2104.01778.](https://arxiv.org/abs/2104.01778) - [Leong, C. H., Huang, Y. H., & Chien, J. T. (2021). Online compressive transformer for end-to-end speech recognition. In 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021 (pp. 1500-1504). International Speech Communication Association.](https://www.isca-speech.org/archive/interspeech_2021/leong21_interspeech.html) - [Agbavor, F., & Liang, H. (2022). Predicting dementia from spontaneous speech using large language models. PLOS Digital Health, 1(12), e0000168.](https://journals.plos.org/digitalhealth/article?id=10.1371/journal.pdig.0000168) - [Lohrenz, T., Li, Z., & Fingscheidt, T. (2021). Multi-encoder learning and stream fusion for transformer-based end-to-end automatic speech recognition. arXiv preprint arXiv:2104.00120.](https://arxiv.org/abs/2104.00120) - [Ristea, N. C., Ionescu, R. T., & Khan, F. S. (2022). SepTr: Separable Transformer for Audio Spectrogram Processing. arXiv preprint arXiv:2203.09581.](https://arxiv.org/abs/2203.09581) - [Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C., & Sutskever, I. (2022). Robust speech recognition via large-scale weak supervision. arXiv preprint arXiv:2212.04356.](https://arxiv.org/abs/2212.04356) - [Wang, C., Chen, S., Wu, Y., Zhang, Z., Zhou, L., Liu, S., ... & Wei, F. (2023). Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers. arXiv preprint arXiv:2301.02111.](https://arxiv.org/abs/2301.02111) - [Criminals Used AI To Clone Company Director's Voice And Steal $35 Million](https://screenrant.com/ai-deepfake-cloned-voice-bank-scam-theft-millions/)