Introduction to Transformers

The main part of the lecture we spent on the demos section in order to give you a sense of how far the state of the art in AI advanced in the last few years. But we start with a short overview of some technical background.

Technical Background

Transformers are a simplification ("attention is all you need") of previous versions of deep neural networks (DNNs) for natural language processing (and other tasks including image processing).

In class we gave a quick introduction to DNNs and backpropagation and hinted at the differences between previous architectures known as recurrent neural networks (RNNs) and Transformers.

Demos

Thinking About the Future of AI

Given the amazing progress we have seen in the demos, what is the future of AI (and humanity)?

Here are some topics to think about. Add your own.

  • How will AI influence the job market?
  • Artificial General Intelligence (AGI): Will software become more intelligent than humans? If yes, when will this happen? What will be the consequences for us humans? What is the AI singularity?
  • Where will the AI arms race take us that has started between countries like the US and China?
  • What is the environmental impact of AI? [1]

Homework

Choose one of the prompts below and post on the slack channel before the next lecture (3/30). Also choose one post by another student and reply with an opinion of your own by the end of the following Friday. (4/1)

  • A link to an article about a recent application of AI that you find interesting, with a short opinion of your own.
  • A link to an article about the future of AI, with a short opinion of your own.

Think about possible project topics. Here are some ideas. (We will talk about this in more detail after the spring break.)

  • Use Grammatical Framework to build a recipe translator.
  • Use a language model for one of various NLP tasks such as sentiment analysis or topic detection.
  • Can we measure the distance between languages by using Google Translate (or OpenAI) to translate back and forth between two languages and then measure the similarity? [2] [3]
  • (let us know about your ideas)

Further Sources

Introductions and Examples

More Examples

Introductory Technical Background

Research

In this short course, we don't have the time to introduce the mathematics needed to understand the original research. But if you want to dive deeper in the future it can't harm to read introductions and conclusions of the articles and build a mental landscape which you can fill later with the mathematical details. [4] [5]

Theory of Transformers

These two papers introduce "attention" to NLP:

This paper, building on the previous two, is credited with introducing the "transformer":

  • Vaswani et al, Attention is all you need, 2017. See also on the annotated-transformer on github. "In this work we propose the Transformer, a model architecture eschewing recurrence and instead relying entirely on an attention mechanism to draw global dependencies between input and output. The Transformer allows for significantly more parallelization and can reach a new state of the art in translation quality after being trained for as little as twelve hours on eight P100 GPUs."

BERT was the breakthrough transformer setting new standards. Github.

GPT uses transformers to learn a language model. Radford etal, Language Models are Unsupervised Multitask Learners, 2018.

T5 (Text-To-Text Transfer Transformer). Video by Colin Raffel. Google AI Blog.

Facebook's XLM-R.

Applications of Transformers


  1. Much of the progress comes at an exponentially increasing cost (energy consumption, carbon footprint, etc). So while AI has been quickly catching up with the human brain in performance, the gap in resource consumption has been widening enormously.

    ↩︎
  2. It is also interesting to measure how long it takes to reach fixed point. ↩︎

  3. (Remember that measuring similarlity can be done with language models.) ↩︎

  4. Many of the articles also contain links to git repositories. A good hands-on way to learn more is to see whether you can recreate (variations of) the experiments and results reported in the papers. This can take a lot of work, but is a great way to learn. ↩︎

  5. If you read some of the articles try to get a sense for how much of the progress is driven by improving hardware. ↩︎