The main part of the lecture we spent on the demos section in order to give you a sense of how far the state of the art in AI advanced in the last few years. But we start with a short overview of some technical background.
Transformers are a simplification ("attention is all you need") of previous versions of deep neural networks (DNNs) for natural language processing (and other tasks including image processing).
In class we gave a quick introduction to DNNs and backpropagation and hinted at the differences between previous architectures known as recurrent neural networks (RNNs) and Transformers.
Given the amazing progress we have seen in the demos, what is the future of AI (and humanity)?
Here are some topics to think about. Add your own.
Choose one of the prompts below and post on the slack channel before the next lecture (3/30). Also choose one post by another student and reply with an opinion of your own by the end of the following Friday. (4/1)
Think about possible project topics. Here are some ideas. (We will talk about this in more detail after the spring break.)
In this short course, we don't have the time to introduce the mathematics needed to understand the original research. But if you want to dive deeper in the future it can't harm to read introductions and conclusions of the articles and build a mental landscape which you can fill later with the mathematical details. [4] [5]
These two papers introduce "attention" to NLP:
This paper, building on the previous two, is credited with introducing the "transformer":
BERT was the breakthrough transformer setting new standards. Github.
GPT uses transformers to learn a language model. Radford etal, Language Models are Unsupervised Multitask Learners, 2018.
T5 (Text-To-Text Transfer Transformer). Video by Colin Raffel. Google AI Blog.
Facebook's XLM-R.
More Applications of Transformers to NLP:
Transformers beat CNNs for image recognition: Dosovitskiy etal, An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, 2021.
Transformers for composing and performing music: Music Transformer, 2018. Magenta blog … demos … github.
Proteinfolding: AlphaFold
Much of the progress comes at an exponentially increasing cost (energy consumption, carbon footprint, etc). So while AI has been quickly catching up with the human brain in performance, the gap in resource consumption has been widening enormously.
↩︎It is also interesting to measure how long it takes to reach fixed point. ↩︎
(Remember that measuring similarlity can be done with language models.) ↩︎
Many of the articles also contain links to git repositories. A good hands-on way to learn more is to see whether you can recreate (variations of) the experiments and results reported in the papers. This can take a lot of work, but is a great way to learn. ↩︎
If you read some of the articles try to get a sense for how much of the progress is driven by improving hardware. ↩︎