Introduction to Transformers

The main part of the lecture we spent on the demos section in order to give you a sense of how far the state of the art in AI advanced in the last few years. But we start with a short overview of some technical background.

Technical Background

Transformers are a simplification ("attention is all you need") of previous versions of deep neural networks (DNNs) for natural language processing (and other tasks including image processing).

In class we gave a quick introduction to DNNs and backpropagation and hinted at the differences between previous architectures known as recurrent neural networks (RNNs) and Transformers.

Demos

Thinking About the Future of AI

Given the amazing progress we have seen in the demos, what is the future of AI (and humanity)?

Here are some topics to think about. Add your own.

How will AI influence the job market?
Artificial General Intelligence (AGI): Will software become more intelligent than humans? If yes, when will this happen? What will be the consequences for us humans? What is the AI singularity?
Where will the AI arms race take us that has started between countries like the US and China?
What is the environmental impact of AI? ^[1]

Homework

Choose one of the prompts below and post on the slack channel before the next lecture (3/30). Also choose one post by another student and reply with an opinion of your own by the end of the following Friday. (4/1)

A link to an article about a recent application of AI that you find interesting, with a short opinion of your own.

A link to an article about the future of AI, with a short opinion of your own.

Think about possible project topics. Here are some ideas. (We will talk about this in more detail after the spring break.)

Use Grammatical Framework to build a recipe translator.
Use a language model for one of various NLP tasks such as sentiment analysis or topic detection.
Can we measure the distance between languages by using Google Translate (or OpenAI) to translate back and forth between two languages and then measure the similarity? ^[2] ^[3]
… (let us know about your ideas) …

Further Sources

Introductions and Examples

6 min video overview of some of the applications of transformers as well as an overview of some of the populat architectures.
Quanta Magazine: Will Transformers Take Over Artificial Intelligence?
AlphaFold, GPT-3 and How to Augment Intelligence with AI and Pt.2.

More Examples

A Guardian article written by GPT-3.
A video by Tom Scott about GPT.

Introductory Technical Background

Research

In this short course, we don't have the time to introduce the mathematics needed to understand the original research. But if you want to dive deeper in the future it can't harm to read introductions and conclusions of the articles and build a mental landscape which you can fill later with the mathematical details. ^[4] ^[5]

Theory of Transformers

These two papers introduce "attention" to NLP:

Bahdanau etal, Neural Machine Translation by Jointly Learning to Align and Translate, 2014. "Neural machine translation is a newly emerging approach to machine translation. Unlike the traditional phrase-based translation system which consists of many small sub-components that are tuned separately, neural machine translation attempts to build and train a single, large neural network that reads a sentence and outputs a correct translation."
Luong etal, Effective Approaches to Attention-based Neural Machine Translation, 2015. "In this work, we design, with simplicity and effectiveness in mind, two novel types of attentionbased models"

This paper, building on the previous two, is credited with introducing the "transformer":

Vaswani et al, Attention is all you need, 2017. See also on the annotated-transformer on github. "In this work we propose the Transformer, a model architecture eschewing recurrence and instead relying entirely on an attention mechanism to draw global dependencies between input and output. The Transformer allows for significantly more parallelization and can reach a new state of the art in translation quality after being trained for as little as twelve hours on eight P100 GPUs."

BERT was the breakthrough transformer setting new standards. Github.

GPT uses transformers to learn a language model. Radford etal, Language Models are Unsupervised Multitask Learners, 2018.

T5 (Text-To-Text Transfer Transformer). Video by Colin Raffel. Google AI Blog.

Facebook's XLM-R.

Applications of Transformers

More Applications of Transformers to NLP:
- Language Models are Few-Shot Learners, 2020.
- Neural Databases, 2020.
- Introducing FLAN: More generalizable Language Models with Instruction Fine-Tuning, 2021.
Transformers beat CNNs for image recognition: Dosovitskiy etal, An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, 2021.
Transformers for composing and performing music: Music Transformer, 2018. Magenta blog … demos … github.
Proteinfolding: AlphaFold

Much of the progress comes at an exponentially increasing cost (energy consumption, carbon footprint, etc). So while AI has been quickly catching up with the human brain in performance, the gap in resource consumption has been widening enormously.
- Training a single AI model can emit as much carbon as five cars in their lifetimes
↩︎
It is also interesting to measure how long it takes to reach fixed point. ↩︎
(Remember that measuring similarlity can be done with language models.) ↩︎
Many of the articles also contain links to git repositories. A good hands-on way to learn more is to see whether you can recreate (variations of) the experiments and results reported in the papers. This can take a lot of work, but is a great way to learn. ↩︎
If you read some of the articles try to get a sense for how much of the progress is driven by improving hardware. ↩︎