# Foundational Papers in Neural Networks
Also see [Semantic Word Embeddings / Word Vectors](https://hackmd.io/@adaburrows/B1_cPiBt6) and [Software for Artificial Neural Networks](https://hackmd.io/@adaburrows/BkhcCiNY6).
## Background and Overview
[Carpenter, G. A. (1989). Neural network models for pattern recognition and associative memory. Neural networks, 2(4), 243-257.](https://apps.dtic.mil/sti/tr/pdf/ADA259428.pdf#page=16)
[MacKay, D. J., & Mac Kay, D. J. (2003). Information theory, inference and learning algorithms. Cambridge university press.](https://www.inference.org.uk/itprnn/book.pdf) — a pretty good overview of the information theory needed to build an understanding of artificial neural networks. Also has content about various neural networks.
## Historical Papers
For a more complete list, see https://people.idsia.ch/~juergen/deep-learning-history.html
- **Hebbian Learning** — [Hebb, D. O. (1949). The organization of behavior. A neuropsychological theory.](https://pure.mpg.de/rest/items/item_2346268/component/file_2346267/content)
- **McColloch-Pitts**
- **Considered first neural net model** — [McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. Bulletin of mathematical biophysics, 5.](https://www.cs.cmu.edu/~./epxing/Class/10715/reading/McCulloch.and.Pitts.pdf)
- [S.C. Kleene. Representation of Events in Nerve Nets and Finite Automata. Automata Studies, Editors: C.E. Shannon and J. McCarthy, Princeton University Press, p. 3-42, Princeton, N.J., 1956.](https://web.archive.org/web/20221207125436/https://apps.dtic.mil/sti/pdfs/ADA596138.pdf)
- [Von Neumann, J. (1956). Probabilistic logics and the synthesis of reliable organisms from unreliable components. Automata studies, 34, 43-98.](https://web.archive.org/web/20231128070228/https://personalpages.manchester.ac.uk/staff/nikolaos.kyparissas/uploads/VonNeumann1956.pdf)
- Also see [Von Neumann, J., & Pierce, R. S. (1952). Lectures on probabilistic logics and the synthesis of reliable organisms from unreliable components. Pasadena, CA, USA: California institute of technology.](https://web.archive.org/web/20230623162146/https://web.mit.edu/6.454/www/papers/pierce_1952.pdf)
- **Neural Fields** — This is actually way more advanced that most of the rest of the neural network architectures that follow.
- [Weiner, N., & Rosenblunth, A. (1946). The mathematical formulation of the problem of conduction of impulses in a network of connected excitable elements specifically in cardiac muscle.](https://web.archive.org/web/20240903005221/https://itlab.us/pubs/wiener_rosenblueth.pdf)
- [Hubel, D. H., & Wiesel, T. N. (1959). Receptive fields of single neurones in the cat's striate cortex. The Journal of physiology, 148(3), 574.](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1363130/)
- [Grossberg, S. (1967). Nonlinear difference-differential equations in prediction and learning theory. Proceedings of the National Academy of Sciences, 58(4), 1329-1334.](https://www.pnas.org/doi/pdf/10.1073/pnas.58.4.1329)
- [Grossberg, S. (1968). Global ratio limit theorems for some nonlinear functional-differential equations. I. Bulletin of the American Mathematical Society, 74(1), 95-99.](https://web.archive.org/web/20170817052506id_/http://www.ams.org/journals/bull/1968-74-01/S0002-9904-1968-11890-7/S0002-9904-1968-11890-7.pdf)
- [Grossberg, S. (1968). Global ratio limit theorems for some nonlinear functional-differential equations. II. Bulletin of the American Mathematical Society, 74(1), 100-105.](https://web.archive.org/web/20170819080031id_/http://www.ams.org/journals/bull/1968-74-01/S0002-9904-1968-11892-0/S0002-9904-1968-11892-0.pdf)
- [S. Grossberg. Some networks that can learn, remember, and reproduce any number of complicated space-time patterns, Indiana University Journal of Mathematics and Mechanics, 19:53-91, 1969.](https://www.researchgate.net/publication/243618523_Some_Networks_That_Can_Learn_Remember_and_Reproduce_any_Number_of_Complicated_Space-Time_Patterns_I)
- [Grossberg, S. (1969). Embedding fields: A theory of learning with physiological implications. Journal of Mathematical Psychology, 6(2), 209-239.](https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=b5b92f00117dd8effa808c71566a1da77d38ff98)
- [D. Marr. Simple memory: a theory for archicortex. Philos Trans R Soc Lond B Biol Sci, 262:841, p 23-81, 1971.](https://web.archive.org/web/20160220032908/https://lnc.usc.edu/papers/marr1971.pdf)
- [Wilson, H. R., & Cowan, J. D. (1972). Excitatory and inhibitory interactions in localized populations of model neurons. Biophysical journal, 12(1), 1-24.](https://www.cell.com/biophysj/pdf/S0006-3495(72)86068-5.pdf)
- [Wilson, H. R., & Cowan, J. D. (1973). A mathematical theory of the functional dynamics of cortical and thalamic nervous tissue. Kybernetik, 13(2), 55-80.](https://web.archive.org/web/20210506174107/http://www.gatsby.ucl.ac.uk/~qhuys/theoretical_neuroscience/Wilson-Cowan73.pdf)
- [Amari, S. I. (1977). Neural theory of association and concept-formation. Biological cybernetics, 26(3), 175-185.](https://bsi-ni.brain.riken.jp/database/file/62/048.pdf) — also see the Amari net below
- [Amari, S. I., Yoshida, K., & Kanatani, K. I. (1977). A mathematical foundation for statistical neurodynamics. SIAM Journal on Applied Mathematics, 33(1), 95-126.](https://www.researchgate.net/publication/242912188_A_Mathematical_Foundation_for_Statistical_Neurodynamics)
- [Amari, S. I. (1977). Dynamics of pattern formation in lateral-inhibition type neural fields. Biological cybernetics, 27(2), 77-87.](https://www.researchgate.net/profile/Shun-Ichi-Amari/publication/22242887_Dynamic_of_pattern_formation_in_lateral-inhibition_type_neural_fields/links/54ab1f8f0cf25c4c472f73df/Dynamic-of-pattern-formation-in-lateral-inhibition-type-neural-fields.pdf)
- Amari, S. I. (1983). Field theory of self-organizing neural nets. IEEE Transactions on Systems, Man, and Cybernetics, (5), 741-748.
- [Grossberg, S. (1988). Nonlinear neural networks: Principles, mechanisms, and architectures. Neural networks, 1(1), 17-61.](https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=8ae6f957b6a4615ade96cb61a6d6b3e6083cadbd)
- [Sompolinsky, H., Crisanti, A., & Sommers, H. J. (1988). Chaos in random neural networks. Physical review letters, 61(3), 259.](https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.61.259)
- [Amari, S. I. (1991). Mathematical theory of neural learning. New Generation Computing, 8, 281-294.](https://bsi-ni.brain.riken.jp/database/file/127/125.pdf)
- [Giese, M. A., & Giese, M. A. (1999). Dynamic neural fields. Dynamic Neural Field Theory for Motion Perception, 49-63.](https://link.springer.com/chapter/10.1007/978-1-4615-5581-0_4)
- [Davis, C. J. (2010). The spatial coding model of visual word identification. Psychological review, 117(3), 713.](https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=78165e246f04045ab20adea389967a8978d510a8)
- [Zibner, S. K., Faubel, C., Iossifidis, I., & Schoner, G. (2011). Dynamic neural fields as building blocks of a cortex-inspired architecture for robotic scene representation. IEEE Transactions on Autonomous Mental Development, 3(1), 74-91.](https://www.ini.rub.de/upload/file/1470692849_d90a1e1fa22b06d0d7d8/ZibnerEtAl2011.pdf)
- [Alecu, L., Frezza-Buet, H., & Alexandre, F. (2011). Can self-organisation emerge through dynamic neural fields computation?. Connection Science, 23(1), 1-31.](https://www.tandfonline.com/doi/epdf/10.1080/09540091.2010.526194?needAccess=true)
- [Sandamirskaya, Y. (2014). Dynamic neural fields as a step toward cognitive neuromorphic architectures. Frontiers in neuroscience, 7, 276.](https://doi.org/10.3389/fnins.2013.00276)
- [Ferreira, F., Erlhagen, W., Sousa, E., Louro, L., & Bicho, E. (2014, October). Learning a musical sequence by observation: A robotics implementation of a dynamic neural field model. In 4th International Conference on Development and Learning and on Epigenetic Robotics (pp. 157-162). IEEE.](https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=b4c6a1ca6b589763e456913f597c64fac5d7f950)
- [Strub, C., Schöner, G., Wörgötter, F., & Sandamirskaya, Y. (2017). Dynamic neural fields with intrinsic plasticity. Frontiers in computational neuroscience, 11, 74.](https://doi.org/10.3389/fncom.2017.00074)
- [Chow, C. C., & Karimipanah, Y. (2020). Before and beyond the Wilson–Cowan equations. Journal of neurophysiology, 123(5), 1645-1656.](https://doi.org/10.1152/jn.00404.2019)
- **Related effects unaccounted for in above models**
- [Buzsáki, G. (2010). Neural syntax: cell assemblies, synapsembles, and readers. Neuron, 68(3), 362-385.](https://doi.org/10.1016/j.neuron.2010.09.023)
- [McFadden, J. (2020). Integrating information in the brain’s EM field: the cemi field theory of consciousness. Neuroscience of consciousness, 2020(1), niaa016.](https://web.archive.org/web/20201028233228/https://academic.oup.com/nc/article/2020/1/niaa016/5909853)
- **Perceptron**
- Rosenblatt, F. (1957). The perceptron, a perceiving and recognizing automaton Project Para. Cornell Aeronautical Laboratory.
- Rosenblatt, F. The perceptron: A theory of statistical separability in cognitive systems. Buffalo: Cornell Aeronautical Laboratory, Inc. Rep. No. VG-1196-G-1, 1958.
- [Rosenblatt, F. (1958). The perceptron: a probabilistic model for information storage and organization in the brain. Psychological review, 65(6), 386.](https://web.archive.org/web/20230506225610/http://www2.denizyuret.com/bib/rosenblatt/Rosenblatt1958/frosenblatt.pdf)
- Joseph, R. D. (1961). Contributions to perceptron theory. PhD thesis, Cornell Univ.
- [Rosenblatt, F. (1962). Perceptions and the theory of brain mechanisms. Spartan books.](https://web.archive.org/web/20230501100311/https://apps.dtic.mil/sti/pdfs/AD0256582.pdf)
- [Amari, S. (1967). A theory of adaptive pattern classifiers. IEEE Transactions on Electronic Computers, (3), 299-307.](https://web.archive.org/web/20231216231715/https://people.idsia.ch/~juergen/amari1967.pdf)
- [Bourlard, H., & Kamp, Y. (1988). Auto-association by multilayer perceptrons and singular value decomposition. Biological cybernetics, 59(4-5), 291-294.](https://infoscience.epfl.ch/record/82601)
- [Anlauf, J. K., & Biehl, M. (1989). The adatron: an adaptive perceptron algorithm. Europhysics Letters, 10(7), 687.](https://iopscience.iop.org/article/10.1209/0295-5075/10/7/014)
- [Gallant, S. I. (1990). Perceptron-based learning algorithms. IEEE Transactions on neural networks, 1(2), 179-191.](https://www.ling.upenn.edu/courses/cogs501/Gallant1990.pdf)
- [Amaldi, E. (1991). On the complexity of training perceptrons. In Artificial Neural Networks: Proceedings of the 1991 International Conference on Artificial Neural Networks (ICANN'91) (Vol. 1, pp. 55-60). North-Holland.](https://hdl.handle.net/11311/677346)
- [Frean, M. (1992). A "thermal" perceptron learning rule. Neural Computation, 4(6), 946-957.](https://web.archive.org/web/20170810045300/http://homepages.ecs.vuw.ac.nz/~marcus/manuscripts/Frean92-thermal-perceptron.pdf)
- [Wendemuth, A. (1995). Learning the unlearnable. Journal of Physics A: Mathematical and General, 28(18), 5423.](https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=f5f164693bb2d9646619c912f51bce18bcab2a78)
- [Freund, Y., & Schapire, R. E. (1998, July). Large margin classification using the perceptron algorithm. In Proceedings of the eleventh annual conference on Computational learning theory (pp. 209-217).](https://cseweb.ucsd.edu/~yfreund/papers/LargeMarginsUsingPerceptron.pdf)
- [Frie, T. T., Cristianini, N., & Campbell, C. (1998, July). The kernel-adatron algorithm: a fast and simple learning procedure for support vector machines. In Machine learning: proceedings of the fifteenth international conference (ICML'98) (pp. 188-196).](https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=5e761bc3b6028308dcd48f9ba0964533c2e6fe43)
- [Collins, M. (2002, July). Discriminative training methods for hidden markov models: Theory and experiments with perceptron algorithms. In Proceedings of the 2002 conference on empirical methods in natural language processing (EMNLP 2002) (pp. 1-8).](https://aclanthology.org/W02-1001.pdf)
- p-delta rule — [Auer, P., Burgsteiner, H., & Maass, W. (2008). A learning rule for very simple universal approximators consisting of a single layer of perceptrons. Neural networks, 21(5), 786-795.](https://web.archive.org/web/20110706095227/http://www.igi.tugraz.at/harry/psfiles/biopdelta-07.pdf)
- **Adaline/Madaline**
- [Widrow, B. (1957). Propagation of statistics in systems. In IRE WESCON Convention Record, Part (Vol. 2, pp. 114-121).](https://web.archive.org/web/20230606185240/https://www-isl.stanford.edu/~widrow/papers/c1957propagationof.pdf)
- [Widrow, B. (1959). Adaptive sample data systems-a statistical theory of adaptation. 1959 WESCON convention record. Part, 4, 74-85.](https://web.archive.org/web/20230606185325/https://www-isl.stanford.edu/~widrow/papers/c1959adaptivesampled.pdf)
- Mattson, R. L. (1959). The design and analysis of an adaptive system for statistical classification (Doctoral dissertation, Massachusetts Institute of Technology, Department of Electrical Engineering).
- [Mattson, R. L. (1959). A Self-Organizing Binary System. Eastern Joint Computer Conference Record. Institute for Research and Education. New-York.](https://web.archive.org/web/20230816121218/http://www.bitsavers.org/pdf/afips/1959-12_%2316.pdf#212)
- [Widrow, B. (1960, July). Adaptive sampled-data systems. In Proceedings of the First International Congress of the International Federation of Automatic Control (pp. 406-411).](https://web.archive.org/web/20100802044733/https://www-isl.stanford.edu/people/widrow/papers/c1960adaptivesampled.pdf)
- [Widrow, B., & Hoff, M. E. (1960, August). Adaptive switching circuits. In IRE WESCON convention record (Vol. 4, No. 1, pp. 96-104).](https://web.archive.org/web/20220419145429/https://isl.stanford.edu/~widrow/papers/c1960adaptiveswitching.pdf)
- [Widrow, B. (1960). An Adaptive Adaline Neuron Using Chemical Memristors. Technical Report.](https://www-isl.stanford.edu/~widrow/papers/t1960anadaptive.pdf)
- Widrow, B. (1962). Generalization and information storage in networks of adaline neurons. Self-organizing systems, 435-461.
- [Cover, T. M. (1965). Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE transactions on electronic computers, (3), 326-334.](https://www-isl.stanford.edu/people/cover/papers/paper2.pdf) — goes over some of the geometrical connections applicable to machine learning
- [Winter, C. R., & Widrow, B. (1988, July). MADALINE RULE II: A training algorithm for neural networks. In Second Annual International Conference on Neural Networks (pp. 1-401).](https://www-isl.stanford.edu/~widrow/papers/c1988madalinerule.pdf)
- [Sutton, R. S., & Barto, A. G. (1981). Toward a modern theory of adaptive networks: expectation and prediction. Psychological review, 88(2), 135.](http://incompleteideas.net/papers/sutton-barto-81-PsychRev.pdf) — Incorporates ideas from the neural field.
- [Widrow, B., & Lehr, M. A. (1990). 30 years of adaptive neural networks: perceptron, madaline, and backpropagation. Proceedings of the IEEE, 78(9), 1415-1442.](https://www-isl.stanford.edu/people/widrow/papers/j199030years.pdf)
- **[Lernmatrix](https://es.wikipedia.org/wiki/Lernmatrix)**
- K. Steinbuch. Die Lernmatrix. (The learning matrix.) Kybernetik, 1(1):36-45, 1961.
- [Steinbuch, K. (1965). Adaptive networks using learning matrices. Kybernetik, 2(4), 148-152.](https://doi.org/10.1007/BF00272311)
- [Román-Godínez, I., López-Yáñez, I., & Yáñez-Márquez, C. (2007). Perfect recall on the lernmatrix. In Advances in Neural Networks–ISNN 2007: 4th International Symposium on Neural Networks, ISNN 2007, Nanjing, China, June 3-7, 2007, Proceedings, Part II 4 (pp. 835-841). Springer Berlin Heidelberg.
](https://www.researchgate.net/publication/220869883_Perfect_Recall_on_the_Lernmatrix)
- [Aldape-Pérez, M., Román-Godínez, I., & Camacho-Nieto, O. (2008). Thresholded learning matrix for efficient pattern recalling. In Progress in Pattern Recognition, Image Analysis and Applications: 13th Iberoamerican Congress on Pattern Recognition, CIARP 2008, Havana, Cuba, September 9-12, 2008. Proceedings 13 (pp. 445-452). Springer Berlin Heidelberg.](https://web.archive.org/web/20240902122714/https://d1wqtxts1xzle7.cloudfront.net/115021072/10.1007_2F978-3-540-85920-8_55-libre.pdf?1716324249=&response-content-disposition=inline%3B+filename%3DThresholded_Learning_Matrix_for_Efficien.pdf&Expires=1725283626&Signature=Uc3oxXBpOL472TR0oeBQaI2jziBx8fk~fKXpBdX8xfF0dzKkYisZ-QLu2nOJDhQzIhHkuS7gvb3Wd86ogUnA294Q9JTHK0Pbjv-SmtGyrlMkQVelqoOy9bnbOdKGPKewj-umzVji9v4PK4yAb6agbyUNYp9yn9BM4kDG-G3DK3Chype2DZDxUHUqMTjjPNMbGhpx3q1kGKfU1T3GE4m6hgwS3Pcy9irOSp-OdCdFaFVQ3q0xVhxfCIVtSiYLCjrcBBmUdM0w26lU86llEhkm8v5Klffp52fxXCWbDZkm0j3xaOqJZ-GO4VA4fPpClyhNqPboWnxwJ6qC9gekomP23A__&Key-Pair-Id=APKAJLOHF5GGSLRBV4ZA)
- [Juan Carbajal Hernández, J., & Sánchez Fernández, L. P. (2011). New algorithm for efficient pattern recall using a static threshold with the Steinbuch Lernmatrix. Connection Science, 23(1), 33-44.](https://www.tandfonline.com/doi/pdf/10.1080/09540091.2011.557716)
- **Self-Organizing Maps (SOM)**
- T. Kohonen. Correlation Matrix Memories. IEEE Transactions on Computers, C-21, p. 353-359, 1972.
- [Kohonen, T. (1982). Self-organized formation of topologically correct feature maps. Biological cybernetics, 43(1), 59-69.](https://www.cnbc.cmu.edu/~tai/nc19journalclubs/Kohonen1982_Article_Self-organizedFormationOfTopol.pdf)
- T. Kohonen. Self-Organization and Associative Memory. Springer, second edition, 1988.
- H. Ritter, T. Kohonen. Self-organizing semantic maps. Biological Cybernetics, 61(4):241-254, 1989.
- [Fort, J. C. (2006). SOM's mathematics. Neural Networks, 19(6-7), 812-816.](http://samos.univ-paris1.fr/archives/WSOM05/papers/WSOM2005-128.pdf)
- K. Nakano. Associatron—A Model of Associative Memory. IEEE Transactions on Systems, Man, and Cybernetics, SMC-2:3 p. 380-388, 1972.
- **[Spin-glass](https://en.wikipedia.org/w/index.php?title=Spin_glass&oldid=1089309646) and [Ising Model](https://en.wikipedia.org/w/index.php?title=Ising_model&oldid=1087414113)**
- **Physics Background**
- W. Lenz (1920). Beitrag zum Verständnis der magnetischen Erscheinungen in festen Körpern. Physikalische Zeitschrift, 21:613-615.
- E. Ising (1925). Beitrag zur Theorie des Ferro- und Paramagnetismus. Dissertation, 1924.
- E. Ising (1925). Beitrag zur Theorie des Ferromagnetismus. Z. Phys., 31 (1): 253-258, 1925.
- H. A. Kramers and G. H. Wannier (1941). Statistics of the Two-Dimensional Ferromagnet. Phys. Rev. 60, 252 and 263, 1941.
- G. H. Wannier (1945). The Statistical Problem in Cooperative Phenomena. Rev. Mod. Phys. 17, 50.
- [Brush, S. G. (1967). History of the Lenz-Ising model. Reviews of modern physics, 39(4), 883.](https://web.archive.org/web/20221208092414/http://personal.rhul.ac.uk/uhap/027/ph4211/PH4211_files/brush67.pdf)
- [Niss, M. (2005). History of the Lenz-Ising model 1920–1950: from ferromagnetic to cooperative phenomena. Archive for history of exact sciences, 59, 267-318.](https://web.archive.org/web/20211022100509/https://verga.cpt.univ-mrs.fr/pdfs/Niss-2005.pdf)
- **Amari**
- [Amari, S. I. (1971). Characteristics of randomly connected threshold-element networks and network systems. Proceedings of the IEEE, 59(1), 35-47.](https://bsi-ni.brain.riken.jp/database/item/73)
- Amari, S. I. (1972). Random nets consisting of excitatory and inhibitory neuron-like elements. IEICE Transactions, 55, 179-185.
- [Amari, S. I. (1972). Characteristics of random nets of analog neuron-like elements. IEEE Transactions on systems, man, and cybernetics, (5), 643-657.](https://bsi-ni.brain.riken.jp/database/file/56/039.pdf)
- [Amari, S. I. (1972). Learning patterns and pattern sequences by self-organizing nets of threshold elements. IEEE Transactions, C 21, 1197-1206, 1972.](https://web.archive.org/web/20231219144029/https://people.idsia.ch/~juergen/amari1972hopfield.pdf) — The paper that started the Amari-Hopefield net
- [Amari, S. I. (1975). Homogeneous nets of neuron-like elements. Biological cybernetics, 17(4), 211-220.](https://bsi-ni.brain.riken.jp/database/file/60/043.pdf)
- **Little**
- [Little, W. A. (1974). The existence of persistent states in the brain. Mathematical biosciences, 19(1-2), 101-120.](https://web.archive.org/web/20230815031215/http://wexler.free.fr/library/files/little%20(1974)%20the%20existence%20of%20persistent%20states%20in%20the%20brain.pdf)
- > We show that given certain plausible assumptions the existence of persistent states in a neural network can occur only if a certain transfer matrix has degenerate maximum eigenvalues. The existence of such states of persistent order is directly analogous to the existence of long range order in an Ising spin system; while the transition to the state of persistent order is analogous to the transition to the ordered phase of the spin system. It is shown that the persistent state is also characterized by correlations between neurons throughout the brain. It is suggested that these persistent states are associated with short term memory while the eigenvectors of the transfer matrix are a representation of long term memory. A numerical example is given that illustrates certain of these features.
- [Little, W. A. (1980). An ising model of a neural network. In Biological Growth and Spread (pp. 173-179). Springer, Berlin, Heidelberg.](https://doi.org/10.1007/978-3-642-61850-5_18)
- > We show that the behavior of a neural network can be mapped onto a generalized spin Ising model. The conditions for the existence of short term memory are related to the existence of long range order in the corresponding Ising model. Even in the presence of noise and fluctuation in the properties of the network, precisely defined states can exist nevertheless. Long term memory appears to result from the modification of the synaptic junctions as a result of signals propagating through the network. The essentially non-linear problem can be linearized and leads to a holographic-like storage of information and means for recall.
- **Hopfield Net**
- [Hopfield, J. J. (1982). Neural networks and physical systems with emergent collective computational abilities. Proceedings of the national academy of sciences, 79(8), 2554-2558.](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC346238/)
- cites Little and Amari, literally uses one of the citations from Little in an example.
- [Hopfield, J. J. (1984). Neurons with graded response have collective computational properties like those of two-state neurons. Proceedings of the national academy of sciences, 81(10), 3088-3092.](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC345226/)
- [Hopfield, J. J., & Tank, D. W. (1985). "Neural" computation of decisions in optimization problems. Biological cybernetics, 52(3), 141-152.](https://axon.cs.byu.edu/~martinez/classes/778/Papers/hopfield_tank.pdf)
- [Storkey, A. J., & Valabregue, R. (1999). The basins of attraction of a new Hopfield learning rule. Neural Networks, 12(6), 869-876.](https://citeseerx.ist.psu.edu/doc/10.1.1.19.4681)
- [A. P. Millan, J. J. Torres, J. Marro. How Memory Conforms to Brain Development. Front. Comput. Neuroscience, 2019](https://doi.org/10.3389/fncom.2019.00022)
- [Krotov, D., & Hopfield, J. J. (2016). Dense associative memory for pattern recognition. Advances in neural information processing systems, 29.](https://arxiv.org/abs/1606.01164)
- **Newer Developments**
- [Amit, Daniel J., Hanoch Gutfreund, and Haim Sompolinsky. "Spin-glass models of neural networks." Physical Review A 32.2 (1985): 1007.](https://www.researchgate.net/publication/283617465_Spin-glass_models_of_neural_networks)
- Remarks on the similarities between the Hopfield and Little Neural network architectures
- [van Hemmen, J. L. (1986). Spin-glass models of a neural network. Physical Review A, 34(4), 3435.](https://doi.org/10.1103/PhysRevA.34.3435)
- [Gardner, E., & Derrida, B. (1988). Optimal storage properties of neural network models. Journal of Physics A: Mathematical and general, 21(1), 271.](https://hal.archives-ouvertes.fr/hal-03285587/file/Optimal%20storage%20properties%20of%20neural%20network%20models.pdf)
- [Sherrington, D. (1993). Neural networks: the spin glass approach. In North-Holland Mathematical Library (Vol. 51, pp. 261-291). Elsevier.](https://web.archive.org/web/20231229080726/https://d1wqtxts1xzle7.cloudfront.net/43229638/On-Line_Learning_Processes_in_Artificial20160301-1491-l4cs33-libre.pdf?1456822244=&response-content-disposition=inline%3B+filename%3DOn_line_learning_processes_in_artificial.pdf&Expires=1703840752&Signature=Ch31UDVvbVhObMEIm6S--bKnaRILBrdw6uMgMHhkWaBS04EMkf4c7~o52K7MCzF6y8IzZQgeWVU4xxwouCV7qGQHhdHMRIN5dUsHpERepfYXh3ugRBm77Rcs3BMnyFkoFNX2NIUXiTXyZEFGKgN882bYggTIOn8HcDo4lsDoTjA8fVO53qK~W4DID4UjB0WGDTwfsS4xUmAPtP3B3A3wL9olzDMyYiVYCqs~WnoxWF0ZmZD~lX51LgD4K8x35J9qOKPKf7MMqljD6mqkcXrx1ZBfvxuq66PDsCHMZhxU12jzYGjEeEpzmJXcP8KFVkhG8OJvB7ty2TpSH9x2aO-eag__&Key-Pair-Id=APKAJLOHF5GGSLRBV4ZA#page=270)
- > Spin glasses are disordered magnetic systems. Their relevance to neural networks lies not in any physical similarity, but rather in conceptual analogy and in the transfer of mathematical techniques developed for their analysis to the quantitative study of several aspects of neural networks. This chapter is concerned with the basis and application of this transfer.
- > The recognition of a conceptual relationship between spin glasses and recurrent neural networks, together with a mathematical mapping between idealizations of each (Hopfield 1982), provided the first hint of what has turned out to be a fruitful transplantation.
- [Schaap, H. G. (2005). Ising models and neural networks. University Library Groningen[Host].](http://www.yaroslavvb.com/papers/schaap-ising.pdf)
- [Schneidman, E., Berry, M. J., Segev, R., & Bialek, W. (2006). Weak pairwise correlations imply strongly correlated network states in a neural population. Nature, 440(7087), 1007-1012.](https://doi.org/10.1038/nature04701)
- [Roudi, Y., Tyrcha, J., & Hertz, J. (2009). Ising model for neural data: model quality and approximate methods for extracting functional connectivity. Physical Review E, 79(5), 051915.](https://doi.org/10.1103/physreve.79.051915)
- [Yoshioka, M. (2009). Learning of spatiotemporal patterns in Ising-spin neural networks: analysis of storage capacity by path integral methods. Physical review letters, 102(15), 158102.](https://doi.org/10.1103/physrevlett.102.158102)
- [Witoelar, A., & Roudi, Y. (2011). Neural network reconstruction using kinetic Ising models with memory. BMC Neuroscience, 12(1), 1-2.](https://doi.org/10.1186/1471-2202-12-S1-P274)
- [Pai, S. (2016) Convolutional Neural Networks Arise From Ising Models and Restricted Boltzmann Machines. Stanford University, APPPHYS 293 Term Paper. 7 June.](https://web.stanford.edu/~sunilpai/convolutional-neural-networks.pdf)
- [Yamamoto, Y., Aihara, K., Leleu, T., Kawarabayashi, K. I., Kako, S., Fejer, M., ... & Takesue, H. (2017). Coherent Ising machines—Optical neural networks operating at the quantum limit. npj Quantum Information, 3(1), 1-15.](https://doi.org/10.1038/s41534-017-0048-9)
- [Morningstar, A., & Melko, R. G. (2018). Deep Learning the Ising Model Near Criticality. Journal of Machine Learning Research, 18(163), 1-17.](https://dl.acm.org/doi/pdf/10.5555/3122009.3242020)
- [Efthymiou, S., Beach, M. J., & Melko, R. G. (2019). Super-resolving the Ising model with convolutional neural networks. Physical Review B, 99(7), 075113.](https://doi.org/10.1103/PhysRevB.99.075113)
- [Talalaev, D. V. (2020, October). Hopfield neural network and anisotropic Ising model. In International Conference on Neuroinformatics (pp. 381-386). Springer, Cham.](https://doi.org/10.1007/978-3-030-60577-3_45)
- [D'Angelo, F., & Böttcher, L. (2020). Learning the Ising model with generative neural networks. arXiv preprint arXiv:2001.05361.](https://doi.org/10.1103/PhysRevResearch.2.023266)
- [Aguilera, M., Moosavi, S. A., & Shimazaki, H. (2021). A unifying framework for mean-field theories of asymmetric kinetic Ising systems. Nature communications, 12(1), 1-12.](https://doi.org/10.1038/s41467-021-20890-5)
- [Kara, O., Sehanobish, A., & Corzo, H. H. (2021). Fine-tuning Vision Transformers for the Prediction of State Variables in Ising Models. arXiv preprint arXiv:2109.13925.](https://arxiv.org/abs/2109.13925)
- [Zhang, Y. (2021). Ising spin configurations with the deep learning method. Journal of Physics Communications, 5(1), 015006.](https://doi.org/10.1088/2399-6528/abd7c3)
- **Adaptive Resonance**
- [Ellias, S. A., & Grossberg, S. (1975). Pattern formation, contrast control, and oscillations in the short term memory of shunting on-center off-surround networks. Biological Cybernetics, 20(2), 69-98.](https://web.archive.org/web/20230103161030/http://techlab.bu.edu/files/resources/articles_cns/EllGro1975BiolCyb.pdf)
- STM — [Grossberg, S., & Grossberg, S. (1982). Contour enhancement, short term memory, and constancies in reverberating neural networks. Studies of Mind and Brain: Neural Principles of Learning, Perception, Development, Cognition, and Motor Control, 332-378.](https://web.archive.org/web/20231230193212/https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=b7dd395303b4d1c241bfec7c3ec3b56294830630)
- ART — [Carpenter, G. A., & Grossberg, S. (1987). A massively parallel architecture for a self-organizing neural pattern recognition machine. Computer vision, graphics, and image processing, 37(1), 54-115.](https://web.archive.org/web/20231230191300/https://d1wqtxts1xzle7.cloudfront.net/50945366/s0734-189x_2887_2980014-220161217-29007-b2dl45-libre.pdf?1482048333=&response-content-disposition=inline%3B+filename%3DA_massively_parallel_architecture_for_a.pdf&Expires=1703967123&Signature=b~~pUtnzyzc~ISnFCOxfu37gelaVRdDSD6uyIIOdliH9nPbmTeHoB-XeB0kLDibIhIeGb78LRD2mujneAl7cmKFk7LYHUe0So8EFJRqog5iBinke~CxLy1Ev7qZPIPLkCCVEAP6nknD2-6X7hwGs88ZUwKtMB9kJDYdCohGp8lPgeJbYIXMttvTYMaaVo9y00749zjpaJWcjRU6VMx6uq8IPEXH5hkISjKamBWbT-rYXFPV-IhiEQuqHcTVQdCyNiDVr7JskeO1PoR-WlNEn04u6paKNbNFWpGoCL3sLydGdgOHlSEZkjFC4d24MxZU5WCWp2pAOxsPwumwKK6QW3A__&Key-Pair-Id=APKAJLOHF5GGSLRBV4ZA) — Mentions attention.
- [Grossberg, S. (1987). Competitive learning: From interactive activation to adaptive resonance. Cognitive science, 11(1), 23-63.](https://web.archive.org/web/20060907041202/http://www.cns.bu.edu/Profiles/Grossberg/Gro1987CogSci.pdf)
- [Carpenter, G. A., Grossberg, S., & Rosen, D. B. (1991). ART 2-A: An adaptive resonance algorithm for rapid category learning and recognition. Neural networks, 4(4), 493-504.](https://web.archive.org/web/20060904212143/http://cns-web.bu.edu/Profiles/Grossberg/CarGro1987AppliedOptics.pdf)
- [Carpenter, G. A., & Grossberg, S. (1990). ART 3: Hierarchical search using chemical transmitters in self-organizing pattern recognition architectures. Neural networks, 3(2), 129-152.](https://web.archive.org/web/20060906014656/http://cns.bu.edu/Profiles/Grossberg/CarGro1990NN.pdf)
- [Carpenter, G. A., Grossberg, S., & Rosen, D. B. (1991). ART 2-A: An adaptive resonance algorithm for rapid category learning and recognition. Neural networks, 4(4), 493-504.](https://web.archive.org/web/20060519092850/http://cns.bu.edu/Profiles/Grossberg/CarGroRos1991NNART2A.pdf)
- [Carpenter, G. A., Grossberg, S., & Reynolds, J. H. (1991). ARTMAP: Supervised real-time learning and classification of nonstationary data by a self-organizing neural network. Neural networks, 4(5), 565-588.](https://web.archive.org/web/20060519091848/http://cns.bu.edu/Profiles/Grossberg/CarGroRey1991NN.pdf)
- [Carpenter, G. A., Grossberg, S., & Rosen, D. B. (1991). Fuzzy ART: Fast stable learning and categorization of analog patterns by an adaptive resonance system. Neural networks, 4(6), 759-771.](https://web.archive.org/web/20060519091505/http://cns.bu.edu/Profiles/Grossberg/CarGroRos1991NNFuzzyART.pdf)
- [Carpenter, G. A., Grossberg, S., Markuzon, N., Reynolds, J. H., & Rosen, D. B. (1992). Fuzzy ARTMAP: A neural network architecture for incremental supervised learning of analog multidimensional maps. IEEE Transactions on neural networks, 3(5), 698-713.](https://web.archive.org/web/20060519094345/http://cns.bu.edu/Profiles/Grossberg/CarGroMarRey1992IEEETransNN.pdf)
- [Tan, A. H. (1995). Adaptive resonance associative map. Neural Networks, 8(3), 437-446.](https://web.archive.org/web/20210812140440/https://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=6227&context=sis_research)
- [Williamson, J. R. (1996). Gaussian ARTMAP: A neural network for fast incremental learning of noisy multidimensional maps. Neural networks, 9(5), 881-897.](https://web.archive.org/web/20170903120705/https://open.bu.edu/bitstream/handle/2144/2180/95.003.pdf?sequence=1&isAllowed=y)
- [Carpenter, G., & Grossberg, S. (1998). Adaptive resonance theory. Boston University Center for Adaptive Systems and Department of Cognitive and Neural Systems.](https://web.archive.org/web/20060519091948/http://cns.bu.edu/Profiles/Grossberg/CarGro2003HBTNN2.pdf)
- [Anagnostopoulos, Georgios C., and M. Georgiopulos. "Hypersphere ART and ARTMAP for unsupervised and supervised, incremental learning." Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium. Vol. 6. IEEE, 2000.](https://web.archive.org/web/20230103160314/http://techlab.bu.edu/files/resources/articles_tt/Anagnostopoulos_Georgiopoulos_2000.pdf)
- Applies SOM techniques to ART — [Tan, A. H. (2006, May). Self-organizing neural architecture for reinforcement learning. In International Symposium on Neural Networks (pp. 470-475). Berlin, Heidelberg: Springer Berlin Heidelberg.](https://www.researchgate.net/profile/Ah-Hwee-Tan/publication/220870704_Self-organizing_Neural_Architecture_for_Reinforcement_Learning/links/54b924590cf2c27adc491724/Self-organizing-Neural-Architecture-for-Reinforcement-Learning.pdf)
- [Tan, A. H., Carpenter, G. A., & Grossberg, S. (2007, June). Intelligence through interaction: Towards a unified theory for learning. In International Symposium on Neural Networks (pp. 1094-1103). Berlin, Heidelberg: Springer Berlin Heidelberg.](https://web.archive.org/web/20231113063613/https://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=7561&context=sis_research)
- [Tscherepanow, M. (2010, September). TopoART: A topology learning hierarchical ART network. In International Conference on Artificial Neural Networks (pp. 157-167). Berlin, Heidelberg: Springer Berlin Heidelberg.](https://web.archive.org/web/20230905002617/https://pub.uni-bielefeld.de/download/1925596/2499061)
- [Tscherepanow, M. (2012). Incremental On-line Clustering with a Topology-Learning Hierarchical ART Neural Network Using Hyperspherical Categories. In ICDM (Poster and Industry Proceedings) (pp. 22-34).](https://web.archive.org/web/20220402114215/https://pub.uni-bielefeld.de/download/2498997/2517690/tscherepanow.marko2012incremental-ICDM.pdf)
- [Tan, A. H., Subagdja, B., Wang, D., & Meng, L. (2019). Self-organizing neural networks for universal learning and multimodal memory encoding. Neural Networks, 120, 58-73.](https://web.archive.org/web/20231113061819/https://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=6206&context=sis_research)
- **Winner-take-all**
- Amari, Shun-Ichi, and Michael A. Arbib. "Competition and cooperation in neural nets." Systems neuroscience (1977): 119-165.
- [Lazzaro, J., Ryckebusch, S., Mahowald, M. A., & Mead, C. A. (1988). Winner-take-all networks of O (n) complexity. Advances in neural information processing systems, 1.](https://web.archive.org/web/20180729163828/http://www.dtic.mil/dtic/tr/fulltext/u2/a451466.pdf)
- Yuille, A. L., & Grzywacz, N. M. (1989). A winner-take-all mechanism based on presynaptic inhibition feedback. Neural Computation, 1(3), 334-347.
- [Coultrip, R., Granger, R., & Lynch, G. (1992). A cortical model of winner-take-all competition via lateral inhibition. Neural networks, 5(1), 47-54.](https://www.researchgate.net/profile/Richard-Granger/publication/222066408_A_cortical_model_of_winner-take-all_competition_via_lateral_inhibition/links/5e0f58c7a6fdcc2837550904/A-cortical-model-of-winner-take-all-competition-via-lateral-inhibition.pdf)
- [Kaski, S., & Kohonen, T. (1994). Winner-take-all networks for physiological models of competitive learning. Neural Networks, 7(6-7), 973-984.](https://doi.org/10.1016/S0893-6080(05)80154-6)
- [Fang, Y., Cohen, M. A., & Kincaid, T. G. (1996). Dynamics of a winner-take-all neural network. Neural Networks, 9(7), 1141-1154.](https://web.archive.org/web/20230512223643/http://www.fang.ece.ufl.edu/mypaper/nn96.pdf)
- [Starzyk, J. A., & Jan, Y. W. (1996, August). A voltage based winner takes all circuit for analog neural networks. In Proceedings of the 39th Midwest Symposium on Circuits and Systems (Vol. 1, pp. 501-504). IEEE.](https://www.researchgate.net/profile/Janusz-Starzyk/publication/3690552_A_voltage_based_winner_takes_all_circuit_for_analog_neural_networks/links/004635212186f4b2e4000000/A-voltage-based-winner-takes-all-circuit-for-analog-neural-networks.pdf)
- [Maass, W. (2000). On the computational power of winner-take-all. Neural computation, 12(11), 2519-2535.](https://web.archive.org/web/20231230194039/https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=ae4c6cdcbbf1786b41cc6581fcf5eebef9d74986)
- [Oster, M., Douglas, R., & Liu, S. C. (2009). Computation with spikes in a winner-take-all network. Neural computation, 21(9), 2437-2465.](https://web.archive.org/web/20181103205805/https://www.zora.uzh.ch/id/eprint/32038/1/neco.2009.07-08-829.pdf)
- [Handrich, S., Herzog, A., Wolf, A., & Herrmann, C. S. (2009, September). A biologically plausible winner-takes-all architecture. In International Conference on Intelligent Computing (pp. 315-326). Berlin, Heidelberg: Springer Berlin Heidelberg.](https://doi.org/10.1007/978-3-642-04020-7_34)
- [Chen, Y. (2017). Mechanisms of winner-take-all and group selection in neuronal spiking networks. Frontiers in computational neuroscience, 11, 20.](https://doi.org/10.3389/fncom.2017.00020)
- [Lynch, N., Musco, C., & Parter, M. (2019). Winner-take-all computation in spiking neural networks. arXiv preprint arXiv:1904.12591.](https://arxiv.org/abs/1904.12591)
- **Convolutional Neural Nets**
- ReLU — Fukushima, K. (1969). Visual feature extraction by a multilayered network of analog threshold elements. IEEE Transactions on Systems Science and Cybernetics, 5(4), 322-333.
- [Fukushima, K. (1980). Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological cybernetics, 36(4), 193-202.](https://www.cs.princeton.edu/courses/archive/spr08/cos598B/Readings/Fukushima1980.pdf)
- [Waibel, A., Hanazawa, T., Hinton, G., Shikano, K., & Lang, K. J. (1989). Phoneme recognition using time-delay neural networks. IEEE Transactions on Acoustics, Speech, and Signal Processing, 37(3), 328-339.](https://web.archive.org/web/20230204092552/https://www.inf.ufrgs.br/~engel/data/media/file/cmp121/waibel89_TDNN.pdf)
- [Zhang, W., Itoh, K., Tanida, J., & Ichioka, Y. (1990). Parallel distributed processing model with local space-invariant interconnections and its optical architecture. Applied optics, 29(32), 4790-4797.](https://drive.google.com/file/d/0B65v6Wo67Tk5ODRzZmhSR29VeDg/view?resourcekey=0-WZ9_n1lL8vmwKdDTohTbSQ)
- **Elman Net**
- [Elman, J. L. (1990). Finding structure in time. Cognitive science, 14(2), 179-211.](https://onlinelibrary.wiley.com/doi/abs/10.1207/s15516709cog1402_1)
- [Liou, C. Y., Huang, J. C., & Yang, W. C. (2008). Modeling word perception using the Elman network. Neurocomputing, 71(16-18), 3150-3157.](http://ntur.lib.ntu.edu.tw//handle/246246/155195)
- [Liou, C. Y., Cheng, W. C., Liou, J. W., & Liou, D. R. (2014). Autoencoder for words. Neurocomputing, 139, 84-96.](https://doi.org/10.1016/j.neucom.2013.09.055)
- **[Transfer learning](https://en.wikipedia.org/wiki/Transfer_learning)**
- Bozinovski, S., & Fulgosi, A. (1976). The influence of pattern similarity and transfer learning upon training of a base perceptron b2. In Proceedings of Symposium Informatica (Vol. 3, pp. 121-126).
- [Bozinovski, S. (1981). Teaching space: A representation concept for adaptive pattern classification. COINS Technical Report No. 81-28.](https://web.archive.org/web/20230307031828/https://web.cs.umass.edu/publication/docs/1981/UM-CS-1981-028.pdf)
- [Pratt, L. Y. (1992). Discriminability-based transfer between neural networks. Advances in neural information processing systems, 5.](https://proceedings.neurips.cc/paper/1992/file/67e103b0761e60683e83c559be18d40c-Paper.pdf)
- [Pratt, L., & Jennings, B. (1996). A survey of transfer between connectionist networks. Connection Science, 8(2), 163-184.](https://doi.org/10.1080/095400996116866)
- [Thrun, S., & Pratt, L. (1998). Learning to learn: Introduction and overview. In Learning to learn (pp. 3-17). Boston, MA: Springer US.](https://www.google.com/books/edition/Learning_to_Learn/X_jpBwAAQBAJ?hl=en)
- [Do, C. B., & Ng, A. Y. (2005). Transfer learning for text classification. Advances in neural information processing systems, 18.](https://proceedings.neurips.cc/paper_files/paper/2005/file/bf2fb7d1825a1df3ca308ad0bf48591e-Paper.pdf)
- [Yosinski, J., Clune, J., Bengio, Y., & Lipson, H. (2014). How transferable are features in deep neural networks?. Advances in neural information processing systems, 27.](https://arxiv.org/abs/1411.1792)
- [Huh, M., Agrawal, P., & Efros, A. A. (2016). What makes ImageNet good for transfer learning?. arXiv preprint arXiv:1608.08614.](https://arxiv.org/abs/1608.08614)
- [Bozinovski, S. (2020). Reminder of the first paper on transfer learning in neural networks, 1976. Informatica, 44(3).](https://web.archive.org/web/20230721234607/https://www.informatica.si/index.php/informatica/article/viewFile/2828/1433)
- [Zoph, B., Ghiasi, G., Lin, T. Y., Cui, Y., Liu, H., Cubuk, E. D., & Le, Q. (2020). Rethinking pre-training and self-training. Advances in neural information processing systems, 33, 3833-3845.](https://proceedings.neurips.cc/paper/2020/file/27e9661e033a73a6ad8cefcde965c54d-Paper.pdf)
- **Backpropagation**
- [Linnainmaa, S. (1976). Taylor expansion of the accumulated rounding error. BIT Numerical Mathematics, 16(2), 146-160.](https://web.archive.org/web/20231229080145/https://papers.baulab.info/papers/also/Linnainmaa-1976.pdf)
- [Werbos, P. J. (2005, September). Applications of advances in nonlinear sensitivity analysis. In System Modeling and Optimization: Proceedings of the 10th IFIP Conference New York City, USA, August 31–September 4, 1981 (pp. 762-770). Berlin, Heidelberg: Springer Berlin Heidelberg.](https://www.researchgate.net/publication/225785177_Applications_of_advances_in_nonlinear_sensitivity_analysis)
- [Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1985). Learning internal representations by error propagation. California Univ San Diego La Jolla Inst for Cognitive Science.](https://apps.dtic.mil/dtic/tr/fulltext/u2/a164453.pdf)
- [Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. nature, 323(6088), 533-536.](http://www.cs.toronto.edu/~hinton/absps/naturebp.pdf)
- [Pineda, F. (1987). Generalization of back propagation to recurrent and higher order neural networks. In Neural information processing systems.](https://proceedings.neurips.cc/paper/1987/hash/735b90b4568125ed6c3f678819b6e058-Abstract.html)
- [Krauth, W., & Mézard, M. (1987). Learning algorithms with optimal stability in neural networks. Journal of Physics A: Mathematical and General, 20(11), L745.](http://www.lptms.u-psud.fr/membres/mezard/Pdf/87_MK_JPA.pdf)
- P. W. Munro. A dual back-propagation scheme for scalar reinforcement learning. Proceedings of the Ninth Annual Conference of the Cognitive Science Society, Seattle, WA, pages 165-176, 1987.
- M. Mozer. A Focused Backpropagation Algorithm for Temporal Pattern Recognition. Complex Systems, 1989.
- [LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., & Jackel, L. D. (1989). Backpropagation applied to handwritten zip code recognition. Neural computation, 1(4), 541-551.](http://yann.lecun.com/exdb/publis/pdf/lecun-89e.pdf)
- [Griewank, A. (2012). Who invented the reverse mode of differentiation. Documenta Mathematica, Extra Volume ISMP, 389400.](https://web.archive.org/web/20231016161151/https://ftp.gwdg.de/pub/misc/EMIS/journals/DMJDMV/vol-ismp/52_griewank-andreas-b.pdf)
- [Whittington, J. C., & Bogacz, R. (2017). An approximation of the error backpropagation algorithm in a predictive coding network with local hebbian synaptic plasticity. Neural computation, 29(5), 1229-1262.](https://doi.org/10.1162/NECO_a_00949)
- [Song, Y., Lukasiewicz, T., Xu, Z., & Bogacz, R. (2020). Can the Brain Do Backpropagation?---Exact Implementation of Backpropagation in Predictive Coding Networks. Advances in neural information processing systems, 33, 22566-22579.](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7610561/)
- [Rosenbaum, R. (2022). On the relationship between predictive coding and backpropagation. Plos one, 17(3), e0266102.](https://doi.org/10.1371/journal.pone.0266102)
- [Millidge, B., Tschantz, A., & Buckley, C. L. (2022). Predictive coding approximates backprop along arbitrary computation graphs. Neural Computation, 34(6), 1329-1368.](https://doi.org/10.1162/neco_a_01497)
- **Tensor Networks**
- [Pellionisz, A., & Llinás, R. (1980). Tensorial approach to the geometry of brain function: Cerebellar coordination via a metric tensor. Neuroscience, 5(7), 1125-1136.](https://doi.org/10.1016/0306-4522(80)90191-8)
- [Pellionisz, A., & Llinás, R. (1982). Tensor theory of brain function. The cerebellum as a space-time metric. In Competition and cooperation in neural nets (pp. 394-417). Springer, Berlin, Heidelberg.](https://doi.org/10.1007/978-3-642-46466-9_23)
- [Pellionisz, A. J. (1985). Robotics connected to neurobiology by tensor theory of brain function. In IEEE Proceedings of the International Conference on Cybernetics and Society (pp. 411-414).](http://www.junkdna.com/1985_ieee_proc_systems_man_cybernetics.pdf)
- [Pellionisz, A., & Llinas, R. (1985). Tensor network theory of the metaorganization of functional geometries in the central nervous system. Neuroscience, 16(2), 245-273.](https://doi.org/10.1016/0306-4522(85)90001-6)
- [Pellionisz, A. J. (1986). Tensor network theory of the central nervous system and sensorimotor modeling. In Brain theory (pp. 121-145). Springer, Berlin, Heidelberg.](https://doi.org/10.1007/978-3-642-70911-1_8)
- [Pellionisz, A. (1988). Tensorial aspects of the multidimensional massively parallel sensorimotor function of neuronal networks. Progress in brain research, 76, 341-354.](https://doi.org/10.1016/S0079-6123(08)64521-5)
- [Pellionisz, A. (1989). Tensor geometry: A language of brains & neurocomputers. Generalized coordinates in neuroscience & robotics. In Neural Computers (pp. 381-391). Springer, Berlin, Heidelberg.](https://doi.org/10.1007/978-3-642-83740-1_39)
- [Pellionisz, A. (1989). Tensor Network Model of the Cerebellum and its Olivary System. The Olivo-Cerebellar System in Motor Control, 400-424.](http://www.junkdna.com/1989_strata_olive_book_springer.pdf)
- [Lv, Z., Luo, S., Liu, Y., & Zheng, Y. (2006, August). Information geometry approach to the model selection of neural networks. In First International Conference on Innovative Computing, Information and Control-Volume I (ICICIC'06) (Vol. 3, pp. 419-422). IEEE.](https://doi.org/10.1109/ICICIC.2006.463)
- [Socher, R., Chen, D., Manning, C. D., & Ng, A. (2013). Reasoning with neural tensor networks for knowledge base completion. Advances in neural information processing systems, 26.](https://dl.acm.org/doi/10.5555/2999611.2999715)
- [Orús, R. (2014). A practical introduction to tensor networks: Matrix product states and projected entangled pair states. Annals of physics, 349, 117-158.](https://doi.org/10.1016/j.aop.2014.06.013)
- [Orús, R. (2014). Advances on tensor network theory: symmetries, fermions, entanglement, and holography. The European Physical Journal B, 87(11), 1-18.](https://doi.org/10.1140/epjb/e2014-50502-9)
- [Phien, H. N., McCulloch, I. P., & Vidal, G. (2015). Fast convergence of imaginary time evolution tensor network algorithms by recycling the environment. Physical Review B, 91(11), 115137.](https://doi.org/10.1103/PhysRevB.91.115137)
- [Evenbly, G., & Vidal, G. (2015). Tensor network renormalization. Physical review letters, 115(18), 180405.](https://doi.org/10.1103/PhysRevLett.115.180405)
- [Czech, B., Lamprou, L., McCandlish, S., & Sully, J. (2015). Integral geometry and holography. Journal of High Energy Physics, 2015(10), 1-41.](https://doi.org/10.1007/JHEP10(2015)175)
- [Sully, J. (2015) Geometry from Compression. Perimeter Institute Recorded Seminar Archive. 3 Feb.](https://pirsa.org/15020080)
- It would be interesting to see if what deep connections exist between this and [information geometry](https://en.wikipedia.org/wiki/Information_geometry).
- [Czech, B., Lamprou, L., McCandlish, S., & Sully, J. (2016). Tensor networks from kinematic space. Journal of High Energy Physics, 2016(7), 1-38.](https://doi.org/10.1007/JHEP07(2016)100)
- [Gan, W. C., & Shu, F. W. (2017). Holography as deep learning. International Journal of Modern Physics D, 26(12), 1743020.](https://doi.org/10.1142/S0218271817430209)
- [Chirco, G., Oriti, D., & Zhang, M. (2018). Group field theory and tensor networks: towards a Ryu–Takayanagi formula in full quantum gravity. Classical and Quantum Gravity, 35(11), 115011.](https://doi.org/10.1088/1361-6382/aabf55)
- [Ganchev, A. (2019, February). On bulk/boundary duality and deep networks. In AIP Conference Proceedings (Vol. 2075, No. 1, p. 100002). AIP Publishing LLC.](https://doi.org/10.1063/1.5091246)
- [Zhang, Q., Guo, B., Kong, W., Xi, X., Zhou, Y., & Gao, F. (2021). Tensor-based dynamic brain functional network for motor imagery classification. Biomedical Signal Processing and Control, 69, 102940.](https://doi.org/10.1016/j.bspc.2021.102940)
- [Kobayashi, M. (2021). Information geometry of hyperbolic-valued Boltzmann machines. Neurocomputing, 431, 163-168.](https://doi.org/10.1016/j.neucom.2020.12.048)
- [Howard, E. (2021). Holographic renormalization with machine learning. In Emerging Technologies in Data Mining and Information Security (pp. 253-261). Springer, Singapore.](https://doi.org/10.1007/978-981-15-9774-9_24)
- [Park, C., Hwang, C. O., Cho, K., & Kim, S. J. (2022). Dual Geometry of Entanglement Entropy via Deep Learning. arXiv preprint arXiv:2205.04445.](https://doi.org/10.48550/arXiv.2205.04445)
- [Gesteau, E., Marcolli, M., & Parikh, S. (2022). Holographic tensor networks from hyperbolic buildings. arXiv preprint arXiv:2202.01788.](https://doi.org/10.48550/arXiv.2202.01788)
- **Autoencoders**
- [Oja, E. (1982). Simplified neuron model as a principal component analyzer. Journal of mathematical biology, 15, 267-273.](http://jeti.uni-freiburg.de/studenten_seminar/term_paper_WS_16_17/Oja.pdf)
- [Bourlard, H., & Kamp, Y. (1988). Auto-association by multilayer perceptrons and singular value decomposition. Biological cybernetics, 59(4-5), 291-294.](https://www.researchgate.net/profile/Herve-Bourlard/publication/19959069_Auto-Association_by_Multilayer_Perceptrons_and_Singular_Value_Decomposition/links/57600aaa08aeeada5bc2b4cc/Auto-Association-by-Multilayer-Perceptrons-and-Singular-Value-Decomposition.pdf)
- [Kramer, M. A. (1991). Nonlinear principal component analysis using autoassociative neural networks. AIChE journal, 37(2), 233-243.](https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=87c280d0dc204ca5db0d325991a21c211aeec866)
- [Japkowicz, N., Hanson, S. J., & Gluck, M. A. (2000). Nonlinear autoassociation is not equivalent to PCA. Neural computation, 12(3), 531-545.](https://direct.mit.edu/neco/article-abstract/12/3/531/6350/Nonlinear-Autoassociation-Is-Not-Equivalent-to-PCA?redirectedFrom=fulltext)
- [Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. science, 313(5786), 504-507.](https://www.science.org/doi/10.1126/science.1127647)
- [Hinton, G. E., Krizhevsky, A., & Wang, S. D. (2011). Transforming auto-encoders. In Artificial Neural Networks and Machine Learning–ICANN 2011: 21st International Conference on Artificial Neural Networks, Espoo, Finland, June 14-17, 2011, Proceedings, Part I 21 (pp. 44-51). Springer Berlin Heidelberg.](https://web.archive.org/web/20181219212445/https://www.cs.toronto.edu/~fritz/absps/transauto6.pdf)
- [Chicco, D., Sadowski, P., & Baldi, P. (2014, September). Deep autoencoder neural networks for gene ontology annotation predictions. In Proceedings of the 5th ACM conference on bioinformatics, computational biology, and health informatics (pp. 533-540).](https://dl.acm.org/doi/10.1145/2649387.2649442)
- [Plaut, E. (2018). From principal subspaces to principal components with linear autoencoders. arXiv preprint arXiv:1804.10253.](https://arxiv.org/abs/1804.10253)
- [Kingma, D. P., & Welling, M. (2019). An introduction to variational autoencoders. Foundations and Trends® in Machine Learning, 12(4), 307-392.](https://arxiv.org/abs/1906.02691)
- [Ackley, D. H., Hinton, G. E., & Sejnowski, T. J. (1985). A learning algorithm for Boltzmann machines. Cognitive science, 9(1), 147-169.](https://www.cs.toronto.edu/~fritz/absps/cogscibm.pdf)
- [Krauth, W., & Mézard, M. (1987). Learning algorithms with optimal stability in neural networks. Journal of Physics A: Mathematical and General, 20(11), L745.](https://www.researchgate.net/profile/Marc-Mezard/publication/230920580_Learning_algorithms_with_optimal_stability_in_neural_networks/links/09e4151439966bc5a8000000/Learning-algorithms-with-optimal-stability-in-neural-networks.pdf)
- [Grossberg, S. (1988). Nonlinear neural networks: Principles, mechanisms, and architectures. Neural networks, 1(1), 17-61.](https://web.archive.org/web/20231228180021/https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=8ae6f957b6a4615ade96cb61a6d6b3e6083cadbd)
- **BAM**
- [Kosko, B. (1988). Bidirectional associative memories. IEEE Transactions on Systems, man, and Cybernetics, 18(1), 49-60.](https://ieeexplore.ieee.org/document/87054)
- [Rakkiyappan, R., Chandrasekar, A., Lakshmanan, S., & Park, J. H. (2015). Exponential stability for markovian jumping stochastic BAM neural networks with mode‐dependent probabilistic time‐varying delays and impulse control. Complexity, 20(3), 39-65.](https://onlinelibrary.wiley.com/doi/10.1002/cplx.21503)
- **Gradient Descent**
- [Lewis, J. P. (1988, July). Creation by refinement: a creativity paradigm for gradient descent learning networks. In ICNN (pp. 229-233).](https://ieeexplore.ieee.org/document/23933)
- [Hanson, S. J. (1990). A stochastic version of the delta rule. Physica D: Nonlinear Phenomena, 42(1-3), 265-272.](https://doi.org/10.1016/0167-2789(90)90081-Y)
- [LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324.](https://web.archive.org/web/20180221193253/yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf)
- [Amari, S. I., & Douglas, S. C. (1998, May). Why natural gradient?. In Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP'98 (Cat. No. 98CH36181) (Vol. 2, pp. 1213-1216). IEEE.](https://web.archive.org/web/20240104155350/https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=4a165d81336028cb257191540109178adf275fea) — applies learnings from Information Geometry to gradient descent.
- [Hochreiter, S., Younger, A. S., & Conwell, P. R. (2001). Learning to learn using gradient descent. In Artificial Neural Networks---ICANN 2001: International Conference Vienna, Austria, August 21--25, 2001 Proceedings 11 (pp. 87-94). Springer Berlin Heidelberg.](http://www.bioinf.jku.at/publications/older/1504.pdf)
- [Du, S. S., Zhai, X., Poczos, B., & Singh, A. (2018). Gradient descent provably optimizes over-parameterized neural networks. arXiv preprint arXiv:1810.02054.](https://arxiv.org/abs/1810.02054)
- [Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function. Mathematics of control, signals and systems, 2(4), 303-314.](https://hal.science/hal-03753170/document) — complexity and continuous valued networks
- **Networks training networks**
- [J. Schmidhuber. Networks adjusting networks. In J. Kindermann and A. Linden, editors, Proceedings of `Distributed Adaptive Neural Information Processing', St.Augustin, 24.-25.5. 1989, pages 197-208. Oldenbourg, 1990. Extended version: TR FKI-125-90 (revised), Institut für Informatik, TUM.](https://web.archive.org/web/20231219143905/https://people.idsia.ch/~juergen/FKI-125-90ocr.pdf)
- J. Schmidhuber. Additional remarks on G. Lukes' review of Schmidhuber's paper `Recurrent networks adjusted by adaptive critics'. Neural Network Reviews, 4(1):43, 1990.
- [Kinzel, W., & Rujan, P. (1990). Improving a network generalization ability by selecting examples. Europhysics Letters, 13(5), 473.](https://iopscience.iop.org/article/10.1209/0295-5075/13/5/016/meta)
- [Hinton, G. E. (1990). Connectionist learning procedures. In Machine learning (pp. 555-610). Morgan Kaufmann.](https://web.archive.org/web/20220616164359/https://files.eric.ed.gov/fulltext/ED294889.pdf) — Provides a pretty good summary of previous work up to this point. Doesn't mention everything though.
- **Renormalization/Information Bottleneck**
- [Equitz, W. H., & Cover, T. M. (1991). Successive refinement of information. IEEE Transactions on Information Theory, 37(2), 269-275.](https://web.archive.org/web/20221222072855/https://www-isl.stanford.edu/people/cover/papers/transIT/0269equi.pdf)
- [Kramer, M. A. (1992). Autoassociative neural networks. Computers & chemical engineering, 16(4), 313-328.](https://doi.org/10.1016/0098-1354(92)80051-A)
- [Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992, July). A training algorithm for optimal margin classifiers. In Proceedings of the fifth annual workshop on Computational learning theory (pp. 144-152).](https://dl.acm.org/doi/10.1145/130385.130401)
- [Tishby, N., Pereira, F. C., & Bialek, W. (2000). The information bottleneck method. arXiv preprint physics/0004057.](https://arxiv.org/abs/physics/0004057)
- [Gilad-Bachrach, R., Navot, A., & Tishby, N. (2003). An information theoretic tradeoff between complexity and accuracy. In Learning Theory and Kernel Machines: 16th Annual Conference on Learning Theory and 7th Kernel Workshop, COLT/Kernel 2003, Washington, DC, USA, August 24-27, 2003. Proceedings (pp. 595-609). Springer Berlin Heidelberg.](https://www.researchgate.net/profile/Naftali-Tishby/publication/2831344_An_Information_Theoretic_Tradeoff_between_Complexity_and_Accuracy/links/0c96051816ace453e2000000/An-Information-Theoretic-Tradeoff-between-Complexity-and-Accuracy.pdf)
- [Shamir, O., Sabato, S., & Tishby, N. (2010). Learning and generalization with the information bottleneck. Theoretical Computer Science, 411(29-30), 2696-2711.](https://doi.org/10.1016/j.tcs.2010.04.006)
- [Neyshabur, B., Tomioka, R., & Srebro, N. (2014). In search of the real inductive bias: On the role of implicit regularization in deep learning. arXiv preprint arXiv:1412.6614.](https://arxiv.org/abs/1412.6614)
- [Mehta, P., & Schwab, D. J. (2014). An exact mapping between the variational renormalization group and deep learning. arXiv preprint arXiv:1410.3831.](https://arxiv.org/abs/1410.3831)
- [Tishby, N., & Zaslavsky, N. (2015, April). Deep learning and the information bottleneck principle. In 2015 ieee information theory workshop (itw) (pp. 1-5). IEEE.](https://arxiv.org/abs/1503.02406)
- [Alemi, A. A., Fischer, I., Dillon, J. V., & Murphy, K. (2016). Deep variational information bottleneck. arXiv preprint arXiv:1612.00410.](https://arxiv.org/abs/1612.00410)
- [Shwartz-Ziv, R., & Tishby, N. (2017). Opening the black box of deep neural networks via information. arXiv preprint arXiv:1703.00810.](https://arxiv.org/abs/1703.00810)
- [Saxe, A. M., Bansal, Y., Dapello, J., Advani, M., Kolchinsky, A., Tracey, B. D., & Cox, D. D. (2019). On the information bottleneck theory of deep learning. Journal of Statistical Mechanics: Theory and Experiment, 2019(12), 124020.](https://artemyk.github.io/assets/pdf/papers/Saxe%20et%20al_2019_On%20the%20information%20bottleneck%20theory%20of%20deep%20learning.pdf)
- [Zhang, C., Bengio, S., Hardt, M., Recht, B., & Vinyals, O. (2021). Understanding deep learning (still) requires rethinking generalization. Communications of the ACM, 64(3), 107-115.](https://dl.acm.org/doi/abs/10.1145/3446776)
- **Continuous-time Networks** — many of these build of ideas from the Neural Field concepts.
- [Funahashi, K. I., & Nakamura, Y. (1993). Approximation of dynamical systems by continuous time recurrent neural networks. Neural networks, 6(6), 801-806.](https://www.sciencedirect.com/science/article/abs/pii/S089360800580125X)
- [Williamson, M. M. (1998). Neural control of rhythmic arm movements. Neural networks, 11(7-8), 1379-1394.](https://groups.csail.mit.edu/lbr/hrg/1998/nn-journal.pdf)
- [Williamson, M. M. (1998, October). Rhythmic robot arm control using oscillators. In Proceedings. 1998 IEEE/RSJ International Conference on Intelligent Robots and Systems. Innovations in Theory, Practice and Applications (Cat. No. 98CH36190) (Vol. 1, pp. 77-83). IEEE.](https://groups.csail.mit.edu/lbr/hrg/1998/mattw-iros98.pdf)
- [Zehr, E. P., Balter, J. E., Ferris, D. P., Hundza, S. R., Loadman, P. M., & Stoloff, R. H. (2007). Neural regulation of rhythmic arm and leg movement is conserved across human locomotor tasks. The Journal of physiology, 582(1), 209-227.](https://physoc.onlinelibrary.wiley.com/doi/abs/10.1113/jphysiol.2007.133843)
- [Zehr, E. P. (2005). Neural control of rhythmic human movement: the common core hypothesis. Exercise and sport sciences reviews, 33(1), 54-60.](https://journals.lww.com/acsm-essr/Fulltext/2005/01000/Neural_Control_of_Rhythmic_Human_Movement__The.10.aspx)
- [Hasani, R. M., Haerle, D., & Grosu, R. (2016, June). Efficient modeling of complex analog integrated circuits using neural networks. In 2016 12th Conference on Ph. D. Research in Microelectronics and Electronics (PRIME) (pp. 1-4). IEEE.](https://ti.tuwien.ac.at/cps/people/grosu/files/prime16.pdf)
- [Mozer, M. C., Kazakov, D., & Lindsey, R. V. (2017). Discrete event, continuous time rnns. arXiv preprint arXiv:1710.04110.](https://arxiv.org/abs/1710.04110)
- [Gleeson, P., Lung, D., Grosu, R., Hasani, R., & Larson, S. D. (2018). c302: a multiscale framework for modelling the nervous system of Caenorhabditis elegans. Philosophical Transactions of the Royal Society B: Biological Sciences, 373(1758), 20170379.](https://royalsocietypublishing.org/doi/full/10.1098/rstb.2017.0379)
- [Gu, A., Dao, T., Ermon, S., Rudra, A., & Ré, C. (2020). Hippo: Recurrent memory with optimal polynomial projections. Advances in neural information processing systems, 33, 1474-1487.](https://proceedings.neurips.cc/paper/2020/file/102f0bb6efb3a6128a3c750dd16729be-Paper.pdf)
- [Lechner, M., Hasani, R., Amini, A., Henzinger, T. A., Rus, D., & Grosu, R. (2020). Neural circuit policies enabling auditable autonomy. Nature Machine Intelligence, 2(10), 642-652.](https://web.archive.org/web/20211124185521/https://publik.tuwien.ac.at/files/publik_292280.pdf)
- [Hasani, R., Lechner, M., Amini, A., Rus, D., & Grosu, R. (2021). Liquid Time-constant Networks. Proceedings of the AAAI Conference on Artificial Intelligence, 35(9), 7657-7666.](https://doi.org/10.1609/aaai.v35i9.16936) — [preprint](https://arxiv.org/abs/2006.04439)
- [Vorbach, C., Hasani, R., Amini, A., Lechner, M., & Rus, D. (2021). Causal navigation by continuous-time neural networks. Advances in Neural Information Processing Systems, 34, 12425-12440.](https://proceedings.neurips.cc/paper/2021/file/67ba02d73c54f0b83c05507b7fb7267f-Paper.pdf)
- [Hasani, R., Lechner, M., Amini, A., Liebenwein, L., Ray, A., Tschaikowski, M., ... & Rus, D. (2022). Closed-form continuous-time neural networks. Nature Machine Intelligence, 1-12.](https://www.nature.com/articles/s42256-022-00556-7)
- [Hasani, R., Lechner, M., Wang, T. H., Chahine, M., Amini, A., & Rus, D. (2022). Liquid structural state-space models. arXiv preprint arXiv:2209.12951.](https://arxiv.org/pdf/2209.12951.pdf)
- [Balcázar, J. L., Gavalda, R., & Siegelmann, H. T. (1997). Computational power of neural networks: A characterization in terms of Kolmogorov complexity. IEEE Transactions on Information Theory, 43(4), 1175-1183.](https://people.cs.umass.edu/~binds/papers/1997_Balcazar_IEEETransInfoTheory.pdf)
- **LSTM**
- [Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.](https://www.researchgate.net/publication/13853244_Long_Short-term_Memory)
- [Gers, F. A., Schmidhuber, J., & Cummins, F. (2000). Learning to forget: Continual prediction with LSTM. Neural computation, 12(10), 2451-2471.](https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=11540131eae85b2e11d53df7f1360eeb6476e7f4)
- [Sak, H., Senior, A. W., & Beaufays, F. (2014). Long short-term memory recurrent neural network architectures for large scale acoustic modeling.](https://web.archive.org/web/20180424203806/https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43905.pdf)
- [Li, X., & Wu, X. (2015, April). Constructing long short-term memory based deep recurrent neural networks for large vocabulary speech recognition. In 2015 ieee international conference on acoustics, speech and signal processing (icassp) (pp. 4520-4524). IEEE.](https://arxiv.org/abs/1410.4281)
- [Greff, K., Srivastava, R. K., Koutník, J., Steunebrink, B. R., & Schmidhuber, J. (2015). LSTM: a search space odyssey. CoRR. arXiv preprint arXiv:1503.04069.](https://arxiv.org/abs/1503.04069)
- [Malhotra, P., Vig, L., Shroff, G., & Agarwal, P. (2015, April). Long Short Term Memory Networks for Anomaly Detection in Time Series. In ESANN (Vol. 2015, p. 89).](https://web.archive.org/web/20200830034708/https://www.elen.ucl.ac.be/Proceedings/esann/esannpdf/es2015-56.pdf)
- **Deep Neural Networks**
- [Hinton, G. E. (2007). Learning multiple layers of representation. Trends in cognitive sciences, 11(10), 428-434.](http://www.cs.toronto.edu/~hinton/absps/tics.pdf)
- [Bengio, Y. (2009). Learning deep architectures for AI. Foundations and trends® in Machine Learning, 2(1), 1-127.](https://web.archive.org/web/20230720043840/https://www.iro.umontreal.ca/~lisa/pointeurs/TR1312.pdf)
- [Glorot, X., & Bengio, Y. (2010, March). Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics (pp. 249-256). JMLR Workshop and Conference Proceedings.](https://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf) — "Xavier" parameter/weight initialization
- [Saxe, A. M., McClelland, J. L., & Ganguli, S. (2013). Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. arXiv preprint arXiv:1312.6120.](https://ora.ox.ac.uk/objects/uuid:49f24917-64cc-4d41-ba52-7aabc67662ce/download_file?file_format=pdf&hyrax_fileset_id=mb152eea8148aaa2d01d92ef760ea1dcb&safe_filename=1312.6120v3.pdf&type_of_work=Conference+item)
- [LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. nature, 521(7553), 436-444.](https://neuron.eng.wayne.edu/auth/ece512/Deep_Learning_Hinton.pdf)
- [Yosinski, J., Clune, J., Bengio, Y., & Lipson, H. (2014). How transferable are features in deep neural networks?. Advances in neural information processing systems, 27.](https://proceedings.neurips.cc/paper_files/paper/2014/hash/375c71349b295fbe2dcdca9206f20a06-Abstract.html)
- [Nguyen, A., Yosinski, J., & Clune, J. (2015). Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 427-436).](https://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Nguyen_Deep_Neural_Networks_2015_CVPR_paper.pdf)
- [Schoenholz, S. S., Gilmer, J., Ganguli, S., & Sohl-Dickstein, J. (2016). Deep information propagation. arXiv preprint arXiv:1611.01232.](https://arxiv.org/pdf/1611.01232)
- [Poole, B., Lahiri, S., Raghu, M., Sohl-Dickstein, J., & Ganguli, S. (2016). Exponential expressivity in deep neural networks through transient chaos. Advances in neural information processing systems, 29.](https://proceedings.neurips.cc/paper/2016/file/148510031349642de5ca0c544f31b2ef-Paper.pdf)
- [Raghu, M., Poole, B., Kleinberg, J., Ganguli, S., & Sohl-Dickstein, J. (2017, July). On the expressive power of deep neural networks. In international conference on machine learning (pp. 2847-2854). PMLR.](https://proceedings.mlr.press/v70/raghu17a/raghu17a.pdf)
- [Schmidhuber, J. (2020). Deep learning: our miraculous year 1990-1991. arXiv preprint arXiv:2005.05744.](https://arxiv.org/abs/2005.05744)
- [Jentzen, A., Kuckuck, B., & von Wurstemberger, P. (2023). Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory. arXiv preprint arXiv:2310.20360.](https://arxiv.org/abs/2310.20360)
- **Deep Recurrent Neural Networks**
- CTRNN — [Yamashita, Y., & Tani, J. (2008). Emergence of functional hierarchy in a multiple timescale neural network model: a humanoid robot experiment. PLoS computational biology, 4(11), e1000220.](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2570613/)
- **Deep Convolutional Neural Networks**
- [Ciregan, D., Meier, U., & Schmidhuber, J. (2012, June). Multi-column deep neural networks for image classification. In 2012 IEEE conference on computer vision and pattern recognition (pp. 3642-3649). IEEE.](https://arxiv.org/pdf/1202.2745.pdf)
- [Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25.](https://proceedings.neurips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf)
- [Maitra, D. S., Bhattacharya, U., & Parui, S. K. (2015, August). CNN based common approach to handwritten character recognition of multiple scripts. In 2015 13th International Conference on Document Analysis and Recognition (ICDAR) (pp. 1021-1025). IEEE.](https://ieeexplore.ieee.org/document/7333916)
- [He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision (pp. 1026-1034).](https://arxiv.org/pdf/1502.01852v1) — details "Kaiming He" parameter/weight initialization
- [Zhang, R. (2019, May). Making convolutional networks shift-invariant again. In International conference on machine learning (pp. 7324-7334). PMLR.](http://proceedings.mlr.press/v97/zhang19a/zhang19a.pdf)
- [Mouton, C., Myburgh, J. C., & Davel, M. H. (2020). Stride and translation invariance in CNNs. In Artificial Intelligence Research: First Southern African Conference for AI Research, SACAIR 2020, Muldersdrift, South Africa, February 22-26, 2021, Proceedings 1 (pp. 267-281). Springer International Publishing.](https://arxiv.org/pdf/2103.10097.pdf)
- [Stelzer, F., Röhm, A., Vicente, R., Fischer, I., & Yanchuk, S. (2021). Deep neural networks using a single neuron: folded-in-time architecture using feedback-modulated delay loops. Nature communications, 12(1), 5164.](https://www.nature.com/articles/s41467-021-25427-4)
- **Random Matrices**
- [Choromanska, A., Henaff, M., Mathieu, M., Arous, G. B., & LeCun, Y. (2015, February). The loss surfaces of multilayer networks. In Artificial intelligence and statistics (pp. 192-204). PMLR.](https://proceedings.mlr.press/v38/choromanska15.pdf)
- [Schoenholz, S. S., Pennington, J., & Sohl-Dickstein, J. (2017). A correspondence between random neural networks and statistical field theory. arXiv preprint arXiv:1710.06570.](https://arxiv.org/pdf/1710.06570)
- [Pennington, J., & Worah, P. (2017). Nonlinear random matrix theory for deep learning. Advances in neural information processing systems, 30.](https://proceedings.neurips.cc/paper_files/paper/2017/file/0f3d014eead934bbdbacb62a01dc4831-Paper.pdf)
- [Pennington, J., & Bahri, Y. (2017, July). Geometry of neural network loss surfaces via random matrix theory. In International conference on machine learning (pp. 2798-2806). PMLR.](https://proceedings.mlr.press/v70/pennington17a/pennington17a.pdf)
- [Louart, C., Liao, Z., & Couillet, R. (2018). A random matrix approach to neural networks. The Annals of Applied Probability, 28(2), 1190-1248.](https://web.archive.org/web/20190504163506id_/https://hal.archives-ouvertes.fr/hal-01962070/document)
- [Martin, C. H., & Mahoney, M. W. (2021). Implicit self-regularization in deep neural networks: Evidence from random matrix theory and implications for learning. Journal of Machine Learning Research, 22(165), 1-73.](https://www.jmlr.org/papers/volume22/20-410/20-410.pdf)
- [Baskerville, N. P. (2023). Random matrix theory and the loss surfaces of neural networks. arXiv preprint arXiv:2306.02108.](https://arxiv.org/pdf/2306.02108)
- [Couillet, R., & Liao, Z. (2023). Random Matrix Methods for Machine Learning.](https://zhenyu-liao.github.io/pdf/RMT4ML.pdf)
- **Background**
- [Tao, T. (2012). Topics in random matrix theory (Vol. 132). American Mathematical Soc..](https://terrytao.wordpress.com/wp-content/uploads/2011/08/matrix-book.pdf)
- [El Karoui, N. (2010). The spectrum of kernel random matrices.](https://web.archive.org/web/20240901035025/https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=36db8c56cf9d14c1250ba3647452faa42fab171c)
- [Cheng, X., & Singer, A. (2013). The spectrum of random inner-product kernel matrices. Random Matrices: Theory and Applications, 2(04), 1350010.](https://arxiv.org/pdf/1202.3155)
- **Hyperdimensional Computing**
- [Kanerva, P. (2009). Hyperdimensional computing: An introduction to computing in distributed representation with high-dimensional random vectors. Cognitive computation, 1, 139-159.](https://cs.uwaterloo.ca/~jhoey/teaching/cogsci600/papers/Kanerva09.pdf)
- Based on Distributed Memories, Holographic Reduced Representation, Spatter Code, Semantic Vectors, Latent Symantic Analysis, Context-dependent Thinning, Vector Symbolic Architecture, and a few other ideas.
- [Smolensky, P. (1990). Tensor product variable binding and the representation of symbolic structures in connectionist systems. Artificial intelligence, 46(1-2), 159-216.](http://www.lscp.net/persons/dupoux/teaching/AT1_2012/papers/Smolensky_1990_TensorProductVariableBinding.AI.pdf)
- [Plate, T. A. (1995). Holographic reduced representations. IEEE Transactions on Neural networks, 6(3), 623-641.](https://web.archive.org/web/20231230170300/https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=645698cdb52b0f1fdcac55da91e56f7ffd935d15)
- Kanerva, P. (1994). The spatter code for encoding concepts at many levels. In ICANN’94: Proceedings of the International Conference on Artificial Neural Networks Sorrento, Italy, 26–29 May 1994 Volume 1, Parts 1 and 2 4 (pp. 226-229). Springer London.
- Gayler, R. W. (1998). Multiplicative binding, representation operators & analogy (workshop poster).
- [Rachkovskij, D. A., & Kussul, E. M. (2001). Binding and normalization of binary sparse distributed representations by context-dependent thinning. Neural Computation, 13(2), 411-452.](https://web.archive.org/web/20231230171234/https://core.ac.uk/download/pdf/207530132.pdf)
- [Gayler, R. W. (2004). Vector symbolic architectures answer Jackendoff's challenges for cognitive neuroscience. arXiv preprint cs/0412059.](https://arxiv.org/abs/cs/0412059)
- [Aerts, D., & Czachor, M. (2008). Tensor-product versus geometric-product coding. Physical Review A, 77(1), 012316.](https://arxiv.org/pdf/0709.1268.pdf)
- See full bibliography at https://view-awesome-table.com/-M9Q2TlY5jMY3xXFfNHl/view
- [Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580.](https://arxiv.org/abs/1207.0580)
- **seq2seq**
- [Graves, A. (2012). Sequence transduction with recurrent neural networks. arXiv preprint arXiv:1211.3711.](https://doi.org/10.48550/arXiv.1211.3711)
- [Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. Advances in neural information processing systems, 27.](https://proceedings.neurips.cc/paper/2014/hash/a14ac55a4f27472c5d894ec1c3c743d2-Abstract.html)
- [Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.](https://doi.org/10.48550/arXiv.1409.0473)
- **GAN** — [Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2014). Generative adversarial nets. Advances in neural information processing systems, 27.](https://proceedings.neurips.cc/paper/2014/file/5ca3e9b122f61f8f06494c97b1afccf3-Paper.pdf)
- **Distilation**
- [Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531.](https://arxiv.org/abs/1503.02531)
- [Anil, R., Pereyra, G., Passos, A., Ormandi, R., Dahl, G. E., & Hinton, G. E. (2018). Large scale distributed neural network training through online distillation. arXiv preprint arXiv:1804.03235.](https://arxiv.org/abs/1804.03235)
- **Diffusion**
- [Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., & Ganguli, S. (2015, June). Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning (pp. 2256-2265). PMLR.](https://arxiv.org/abs/1503.03585)
- [Song, Y., & Ermon, S. (2019). Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems, 32.](https://arxiv.org/abs/1907.05600)
- [Ho, J., Jain, A., & Abbeel, P. (2020). Denoising diffusion probabilistic models. Advances in neural information processing systems, 33, 6840-6851.](https://arxiv.org/abs/2006.11239)
- [Soudry, D., Di Castro, D., Gal, A., Kolodny, A., & Kvatinsky, S. (2015). Memristor-based multilayer neural networks with online gradient descent training. IEEE transactions on neural networks and learning systems, 26(10), 2408-2421.](https://web.archive.org/web/20160916163309/http://webee.technion.ac.il/people/skva/TNNLS.pdf)
- [Widrow, B., Greenblatt, A., Kim, Y., & Park, D. (2013). The no-prop algorithm: A new learning algorithm for multilayer neural networks. Neural Networks, 37, 182-188.](https://web.archive.org/web/20230607025139/https://www-isl.stanford.edu/~widrow/papers/131.no_prop_neural_networks.pdf)
- [Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1), 1929-1958.](https://www.jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf)
- [Demircigil, M., Heusel, J., Löwe, M., Upgang, S., & Vermet, F. (2017). On a model of associative memory with huge storage capacity. Journal of Statistical Physics, 168, 288-299.](https://link.springer.com/article/10.1007/s10955-017-1806-y)
- [Sabour, S., Frosst, N., & Hinton, G. E. (2017). Dynamic routing between capsules. Advances in neural information processing systems, 30.](https://proceedings.neurips.cc/paper/2017/file/2cad8fa47bbef282badbb8de5374b894-Paper.pdf)
- [Li, Y., Gimeno, F., Kohli, P., & Vinyals, O. (2020). Strong generalization and efficiency in neural programs. arXiv preprint arXiv:2007.03629.](https://arxiv.org/pdf/2007.03629.pdf)
- [Bubeck, S., & Sellke, M. (2021). A universal law of robustness via isoperimetry. Advances in Neural Information Processing Systems, 34, 28811-28822.](https://arxiv.org/abs/2105.12806)
- [Delétang, G., Ruoss, A., Grau-Moya, J., Genewein, T., Wenliang, L. K., Catt, E., ... & Ortega, P. A. (2022). Neural networks and the chomsky hierarchy. arXiv preprint arXiv:2207.02098.](https://arxiv.org/pdf/2207.02098)
- [Barnett, A. J., Guo, Z., Jing, J., Ge, W., Rudin, C., & Westover, M. B. (2022). An Interpretable Machine Learning System to Identify EEG Patterns on the Ictal-Interictal-Injury Continuum. arXiv preprint arXiv:2211.05207.](https://arxiv.org/abs/2211.05207)
- [Song, Y., Millidge, B., Salvatori, T., Lukasiewicz, T., Xu, Z., & Bogacz, R. (2024). Inferring neural activity before plasticity as a foundation for learning beyond backpropagation. Nature Neuroscience, 1-11.](https://doi.org/10.1038/s41593-023-01514-1)
## Category Theory for AI
- [Fong, B., Spivak, D., & Tuyéras, R. (2017). Backprop as functor: A compositional perspective on supervised learning.(2017). arXiv preprint arXiv:1711.10455.](https://arxiv.org/abs/1711.10455)
- [Fong, B., & Johnson, M. (2019). Lenses and learners. arXiv preprint arXiv:1903.03671.](https://arxiv.org/pdf/1903.03671)
- Cites: [Bohannon, A., Pierce, B. C., & Vaughan, J. A. (2006, June). Relational lenses: a language for updatable views. In Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems (pp. 338-347).](https://web.archive.org/web/20240901195839/https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=8ee979ec0a1d8288a07a813c7bde91e4a2ae3187)
- [Shiebler, D., Gavranović, B., & Wilson, P. (2021). Category theory in machine learning. arXiv preprint arXiv:2106.07032.](https://arxiv.org/abs/2106.07032)
- [Gavranović, Bruno. "Fundamental Components of Deep Learning: A category-theoretic approach." arXiv preprint arXiv:2403.13001 (2024).](https://www.brunogavranovic.com/assets/FundamentalComponentsOfDeepLearning.pdf)
- [Gavranović, B., Lessard, P., Dudzik, A., von Glehn, T., Araújo, J. G., & Veličković, P. (2024). Categorical deep learning: An algebraic theory of architectures. arXiv preprint arXiv:2402.15332.](https://arxiv.org/abs/2402.15332)
- https://categoricaldeeplearning.com/
- https://cats.for.ai/
- **Background**
- [Bradley, T. D. (2018). What is applied category theory?.](https://arxiv.org/pdf/1809.05923.pdf)
- [Fong, B., Spivak, D. I. (2018). Seven sketches in compositionality: An invitation to applied category theory.](https://arxiv.org/pdf/1803.05316.pdf)
- [Baez, J., & Stay, M. (2010). Physics, topology, logic and computation: a Rosetta Stone. In New structures for physics (pp. 95-172). Springer, Berlin, Heidelberg.](https://math.ucr.edu/home/baez/rosetta.pdf)
- [Burrows, J. (2023). Understanding program control in pure functional programming with monads + comonads!?.](https://hackmd.io/@adaburrows/B1WzXFDyp)
- [Dixon, L., & Kissinger, A. (2013). Open-graphs and monoidal theories. Mathematical Structures in Computer Science, 23(2), 308-359.](https://arxiv.org/pdf/1011.4114.pdf)
- [Spivak, D. I. (2013). The operad of wiring diagrams: formalizing a graphical language for databases, recursion, and plug-and-play circuits. arXiv preprint arXiv:1305.0297.](https://arxiv.org/pdf/1305.0297.pdf)
- [Vagner, D., Spivak, D. I., & Lerman, E. (2014). Algebras of open dynamical systems on the operad of wiring diagrams. arXiv preprint arXiv:1408.1598.](https://arxiv.org/pdf/1408.1598.pdf)
- [Bonchi, F., Sobociński, P., & Zanasi, F. (2014, September). A categorical semantics of signal flow graphs. In International Conference on Concurrency Theory (pp. 435-450). Springer, Berlin, Heidelberg.](https://link.springer.com/chapter/10.1007/978-3-662-44584-6_30) — [HAL](https://hal.archives-ouvertes.fr/hal-02134182/file/sfg.pdf) — [archive](https://web.archive.org/web/20180626100531/https://www.southampton.ac.uk/~ps1a06/papers/sfg.pdf)
- [Vallette, B. (2014). Algebra+ homotopy= operad. Symplectic, Poisson, and noncommutative geometry, 62, 229-290.](https://arxiv.org/pdf/1202.3245.pdf)
- [Baez, J. C., & Fong, B. (2015). A compositional framework for passive linear networks. arXiv preprint arXiv:1504.05625.](https://arxiv.org/pdf/1504.05625.pdf)
- [Hackney, P., Robertson, M., Yau, D., & Yau, D. Y. (2015). Infinity properads and infinity wheeled properads (Vol. 2147). Cham: Springer.](https://arxiv.org/pdf/1410.6716.pdf)
- [Fong, B. (2015). Decorated cospans. arXiv preprint arXiv:1502.00872.](https://arxiv.org/pdf/1502.00872.pdf)
- [Kock, J. (2016). Graphs, hypergraphs, and properads. Collectanea mathematica, 67(2), 155-190.](https://arxiv.org/pdf/1407.3744.pdf)
- [Baez, J. C., Fong, B., & Pollard, B. S. (2016). A compositional framework for Markov processes. Journal of Mathematical Physics, 57(3), 033301.](https://doi.org/10.1063/1.4941578) — [arXiv](https://arxiv.org/pdf/1508.06448.pdf)
- [Lerman, E., & Spivak, D. I. (2016). An algebra of open continuous time dynamical systems and networks. arXiv preprint arXiv:1602.01017.](https://arxiv.org/pdf/1602.01017v1.pdf) (Superceded by Lerman, E. (2018). Networks of open systems.)
- [Courser, K. (2016). A bicategory of decorated cospans. arXiv preprint arXiv:1605.08100.](https://arxiv.org/pdf/1605.08100.pdf)
- [Fong, B. (2016). The algebra of open and interconnected systems. arXiv preprint arXiv:1609.05382.](https://arxiv.org/pdf/1609.05382.pdf)
- [Coya, B., & Fong, B. (2016). Corelations are the prop for extraspecial commutative Frobenius monoids. arXiv preprint arXiv:1601.02307.](https://arxiv.org/pdf/1601.02307.pdf)
- [Hedges, J., Shprits, E., Winschel, V., & Zahn, P. (2016). Compositionality and string diagrams for game theory. arXiv preprint arXiv:1604.06061.](https://arxiv.org/pdf/1604.06061.pdf)
- [Clerc, F., Humphrey, H., & Panangaden, P. (2017). Bicategories of Markov processes. In Models, Algorithms, Logics and Tools (pp. 112-124). Springer, Cham.](https://www.cs.mcgill.ca/~prakash/Pubs/bicats_final.pdf)
- [Baez, J. C., & Courser, K. (2017). Coarse-graining open Markov processes. arXiv preprint arXiv:1710.11343.](https://arxiv.org/pdf/1710.11343.pdf)
- [Spivak, D. I., Schultz, P., & Rupel, D. (2017). String diagrams for traced and compact categories are oriented 1-cobordisms. Journal of Pure and Applied Algebra, 221(8), 2064-2110.](https://arxiv.org/pdf/1508.01069.pdf)
- [Spivak, D. I., & Tan, J. (2017). Nesting of dynamical systems and mode-dependent networks. Journal of Complex Networks, 5(3), 389-408.](https://arxiv.org/pdf/1502.07380.pdf)
- [Lerman, E. (2018). Networks of open systems. Journal of Geometry and Physics, 130, 81-112.](https://doi.org/10.1016/j.geomphys.2018.03.020) — [arXiv](https://arxiv.org/pdf/1705.04814.pdf).
- [Ghani, N., Hedges, J., Winschel, V., & Zahn, P. (2018, July). Compositional game theory. In Proceedings of the 33rd annual ACM/IEEE symposium on logic in computer science (pp. 472-481).](https://arxiv.org/pdf/1603.04641.pdf)
- [Bonchi, F., Seeber, J., & Sobocinski, P. (2018). Graphical conjunctive queries. arXiv preprint arXiv:1804.07626.](https://arxiv.org/pdf/1804.07626.pdf)
- [Bolt, J., Hedges, J., & Zahn, P. (2019). Bayesian open games. arXiv preprint arXiv:1910.03656.](https://arxiv.org/abs/1910.03656)
- [Fong, B., & Spivak, D. I. (2019). Hypergraph categories. Journal of Pure and Applied Algebra, 223(11), 4746-4777.](https://doi.org/10.1016/j.jpaa.2019.02.014)
- [Jenča, G. (2019). Two monads on the category of graphs. Mathematica Slovaca, 69(2), 257-266.](https://arxiv.org/pdf/1706.00081.pdf)
- [Hedges, J., & Herold, J. (2019). Foundations of brick diagrams. arXiv preprint arXiv:1908.10660.](https://arxiv.org/pdf/1908.10660.pdf)
- [Genovese, F., Knispel, A., & Fitzgerald, J. (2019). Mapping finite state machines to zk-SNARKS Using Category Theory. arXiv preprint arXiv:1909.02893.](https://arxiv.org/pdf/1909.02893.pdf)
- [Master, J., Patterson, E., Yousfi, S., & Canedo, A. (2020, August). String diagrams for assembly planning. In International Conference on Theory and Application of Diagrams (pp. 167-183). Springer, Cham.](https://arxiv.org/pdf/1909.10475.pdf)
- [Myers, D. J. (2020). Double categories of open dynamical systems. arXiv preprint arXiv:2005.05956.](https://arxiv.org/pdf/2005.05956.pdf)
- [Spivak, D. I. (2020). Poly: An abundant categorical setting for mode-dependent dynamics. arXiv preprint arXiv:2005.01894.](https://arxiv.org/pdf/2005.01894.pdf)
- [Libkind, S. (2020). An algebra of resource sharing machines. arXiv preprint arXiv:2007.14442.](https://arxiv.org/pdf/2007.14442.pdf)
- [Master, J. (2020). The open algebraic path problem. arXiv preprint arXiv:2005.06682.](https://arxiv.org/pdf/2005.06682.pdf)
- [Baez, J. C., Genovese, F., Master, J., & Shulman, M. (2021, June). Categories of nets. In 2021 36th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS) (pp. 1-13). IEEE.](https://arxiv.org/pdf/2101.04238.pdf)
- [Master, J. E. (2021). Composing Behaviors of Networks. University of California, Riverside.](https://arxiv.org/pdf/2105.12905.pdf)
- [Chih, T., & Scull, L. (2021). A homotopy category for graphs. Journal of Algebraic Combinatorics, 53(4), 1231-1251.](https://arxiv.org/pdf/1901.01619.pdf)
- [Hackney, P. (2021). Categories of graphs for operadic structures. arXiv preprint arXiv:2109.06231.](https://arxiv.org/pdf/2109.06231.pdf)
- [Patterson, E., Lynch, O., & Fairbanks, J. (2021). Categorical data structures for technical computing. arXiv preprint arXiv:2106.04703.](https://arxiv.org/pdf/2106.04703.pdf)
- [Pisani, C. (2022). Operads as double functors. arXiv preprint arXiv:2208.07028.](https://arxiv.org/pdf/2208.07028.pdf)
- [Proudfoot, N., & Ramos, E. (2022). The contraction category of graphs. Representation Theory of the American Mathematical Society, 26(23), 673-697.](https://arxiv.org/pdf/1907.11234.pdf)
## Semantic Word Embeddings and Large Language Models
See [Semantic Word Embeddings / Word Vectors](/2IBLoQz4SsGougvgimR93w)
## Voice recognition and generation
- [Hunt, A. J., & Black, A. W. (1996, May). Unit selection in a concatenative speech synthesis system using a large speech database. In 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings (Vol. 1, pp. 373-376). IEEE.](https://www.ee.columbia.edu/~dpwe/e6820/papers/HuntB96-speechsynth.pdf)
- [Black, A. W., Zen, H., & Tokuda, K. (2007, April). Statistical parametric speech synthesis. In 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP'07 (Vol. 4, pp. IV-1229). IEEE.](https://citeseerx.ist.psu.edu/doc/10.1.1.154.9874)
- [Loots, L. (2010). Data-driven augmentation of pronunciation dictionaries (Doctoral dissertation, Stellenbosch: University of Stellenbosch).](https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.832.2872&rep=rep1&type=pdf)
- [Oord, A. V. D., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., ... & Kavukcuoglu, K. (2016). Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499.](https://arxiv.org/abs/1609.03499)
- [Li, Y., Babuschkin, I., Simonyan, K., Vinyals, O., Kavukcuoglu, K., Lockhart, E., ... & Hassabis, D. (2017). Parallel WaveNet: Fast High-Fidelity Speech Synthesis. arXiv preprint arXiv:1711.10433.](https://arxiv.org/abs/1711.10433)
- [Shen, J., Pang, R., Weiss, R. J., Schuster, M., Jaitly, N., Yang, Z., ... & Wu, Y. (2017). Natural TTS synthesis by conditioning WaveNet on Mel spectrogram predictions. CoRR abs/1712.05884 (2017). arXiv preprint arXiv:1712.05884.](https://arxiv.org/abs/1712.05884)
- [Chen, Y., Assael, Y., Shillingford, B., Budden, D., Reed, S., Zen, H., ... & de Freitas, N. (2018). Sample efficient adaptive text-to-speech. arXiv preprint arXiv:1809.10460.](https://arxiv.org/abs/1809.10460v1)
- [Hsu, W. N., Zhang, Y., Weiss, R. J., Zen, H., Wu, Y., Wang, Y., ... & Pang, R. (2018). Hierarchical generative modeling for controllable speech synthesis. arXiv preprint arXiv:1810.07217.](https://arxiv.org/abs/1810.07217)
- [Li, Y., & Mandt, S. (2018). Disentangled sequential autoencoder. arXiv preprint arXiv:1803.02991.](https://arxiv.org/abs/1803.02991)
- [Chorowski, J., Weiss, R. J., Bengio, S., & Van Den Oord, A. (2019). Unsupervised speech representation learning using wavenet autoencoders. IEEE/ACM transactions on audio, speech, and language processing, 27(12), 2041-2053.](https://arxiv.org/abs/1901.08810)
- [Bińkowski, M., Donahue, J., Dieleman, S., Clark, A., Elsen, E., Casagrande, N., ... & Simonyan, K. (2019). High fidelity speech synthesis with adversarial networks. arXiv preprint arXiv:1909.11646.](https://arxiv.org/abs/1909.11646)
- [Habib, R., Mariooryad, S., Shannon, M., Battenberg, E., Skerry-Ryan, R. J., Stanton, D., ... & Bagby, T. (2019). Semi-supervised generative modeling for controllable speech synthesis. arXiv preprint arXiv:1910.01709.](https://arxiv.org/abs/1910.01709)
- [Chung, Y. A., Wang, Y., Hsu, W. N., Zhang, Y., & Skerry-Ryan, R. J. (2019, May). Semi-supervised training for improving data efficiency in end-to-end speech synthesis. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6940-6944). IEEE.](https://arxiv.org/abs/1808.10128)
- [Ren, Y., Tan, X., Qin, T., Zhao, S., Zhao, Z., & Liu, T. Y. (2019, May). Almost unsupervised text to speech and automatic speech recognition. In International conference on machine learning (pp. 5410-5419). PMLR.](https://arxiv.org/abs/1905.06791)
- [Kong, J., Kim, J., & Bae, J. (2020). Hifi-gan: Generative adversarial networks for efficient and high fidelity speech synthesis. Advances in Neural Information Processing Systems, 33, 17022-17033.](https://arxiv.org/abs/2010.05646v2)
- [Gong, Y., Chung, Y. A., & Glass, J. (2021). Ast: Audio spectrogram transformer. arXiv preprint arXiv:2104.01778.](https://arxiv.org/abs/2104.01778)
- [Leong, C. H., Huang, Y. H., & Chien, J. T. (2021). Online compressive transformer for end-to-end speech recognition. In 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021 (pp. 1500-1504). International Speech Communication Association.](https://www.isca-speech.org/archive/interspeech_2021/leong21_interspeech.html)
- [Agbavor, F., & Liang, H. (2022). Predicting dementia from spontaneous speech using large language models. PLOS Digital Health, 1(12), e0000168.](https://journals.plos.org/digitalhealth/article?id=10.1371/journal.pdig.0000168)
- [Lohrenz, T., Li, Z., & Fingscheidt, T. (2021). Multi-encoder learning and stream fusion for transformer-based end-to-end automatic speech recognition. arXiv preprint arXiv:2104.00120.](https://arxiv.org/abs/2104.00120)
- [Ristea, N. C., Ionescu, R. T., & Khan, F. S. (2022). SepTr: Separable Transformer for Audio Spectrogram Processing. arXiv preprint arXiv:2203.09581.](https://arxiv.org/abs/2203.09581)
- [Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C., & Sutskever, I. (2022). Robust speech recognition via large-scale weak supervision. arXiv preprint arXiv:2212.04356.](https://arxiv.org/abs/2212.04356)
- [Wang, C., Chen, S., Wu, Y., Zhang, Z., Zhou, L., Liu, S., ... & Wei, F. (2023). Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers. arXiv preprint arXiv:2301.02111.](https://arxiv.org/abs/2301.02111)
- [Criminals Used AI To Clone Company Director's Voice And Steal $35 Million](https://screenrant.com/ai-deepfake-cloned-voice-bank-scam-theft-millions/)