Deep learning has seen tremendous growth over the last decade and a half, resulting in a sweeping change in artificial intelligence and the way computers are used to solve real-world problems. I wanted to illustrate this sweeping change through four timelines:

applications
methodology
responsible use and safety
theory of deep learning

Timeline 1: Breakthroughs in Applications

Of course one of the reasons deep learning is so important is because of the several applications it has enabled. With it, computers were able to tackle problems that evaded previous approaches. In particular, deep learning excels in situations where natural images, video, or language data has to be processed, understood, manipulated, or acted on. Here is a timeline of some significant headliner breakthroughs.

2012 - AlexNet/ImageNet Challenge: AlexNet’s victory in the ImageNet competition marked the resurgence of the neural network approach in machine learning, and the beginning of the dominion of convolutional networks for computer vision. The significance of this method is that a single, end-to-end neural network was able to surpass the performance of a complex web of hand-engineered methods that were used in computer vision before.
2015 - DQN-Atari: Deep Q-Networks (DQN) achieved human-level performance on Atari games, showcasing the potential of deep reinforcement learning. Notably, this method used raw pixels from each game - rather than an abstract representation of objects or the objectives of each game.
2015 - Neural Machine Translation (NMT): Google Translate adopted neural machine translation with sequence-to-sequence models and attention, significantly improving translation quality. Like in vision, neural networks replaced very complicated engineered systems with lots of moving parts. NMT was soon implemented in Google Translate.
2015 - Image-to-text captioning: Models like Show, Attend and Tell combined CNNs and RNNs to generate accurate image captions, signaling the rise of multimodal AI.
2016 - WaveNet: DeepMind introduced WaveNet, revolutionizing text-to-speech synthesis with natural-sounding, human-like audio.
2016 - AlphaGo: DeepMind’s AlphaGo defeated the world champion Go player, demonstrating the potential of reinforcement learning for strategic games.
2017 - OpenAI Dota 2 (OpenAI Five): Reinforcement learning models defeated professional players in Dota 2, a complex multiplayer online battle arena game.
2019 - AlphaStar (StarCraft II): DeepMind’s AlphaStar reached Grandmaster level in StarCraft II, showcasing AI’s ability to handle real-time strategy games with long time horizons.
2019 - GPT-2: OpenAI’s GPT-2 demonstrated the potential of large-scale language models, generating coherent and contextually relevant text.
2020 - AlphaFold: DeepMind’s AlphaFold solved the protein folding problem, a decades-old challenge in molecular biology.
2021 - AlphaCode/Codex/Copilot (LLMs for text and code): Models like Codex powered applications like GitHub Copilot, revolutionizing programming by enabling natural language coding.
2022 - Minerva/LLEMMA/AlphaProof/AlphaGeometry: Advanced models tackled complex mathematical reasoning, formal proofs, and problem-solving in geometry.

Timeline 2: Methods

The breakthroughs were supported by, and motivated rapid development in methodology.

Foundational Architectures (Vision and General Methods):

1998 - Basic CNNs: Yann LeCun’s work on convolutional neural networks (CNNs) for handwritten digit recognition (LeNet) established the foundation for modern vision models.
2012 - AlexNet: Revived deep learning by scaling up CNNs and using GPUs to win the ImageNet challenge.
2014 - VGG: Introduced a very deep architecture with smaller convolutional filters, setting a benchmark for simple, deep CNN designs.
2015 - ResNet: Introduced residual connections, enabling the training of very deep networks and surpassing human-level performance in image recognition.
2020 - Vision Transformers (ViT): Applied Transformer architectures to vision tasks, achieving state-of-the-art performance by leveraging self-attention mechanisms.

Generative Models:

2013 - Variational Autoencoders (VAE): Provided a probabilistic framework for generating data while learning latent representations.
2014 - Generative Adversarial Networks (GANs): Introduced by Ian Goodfellow, GANs became the foundation for realistic image synthesis and adversarial training.
2021 - Diffusion Models: Models like DDPM (Denoising Diffusion Probabilistic Models) enabled state-of-the-art image and audio generation, powering tools like Stable Diffusion and DALL-E 2.

Text Models:

1997 - LSTM: Long Short-Term Memory networks addressed the vanishing gradient problem, enabling RNNs to capture long-term dependencies.
2014 - seq-to-seq: For a significant time, models requiring text manipulation followed a sequence-to-sequence approach in which two LSTMs - an encoder and a decoder were connected by a hidden state.
2014 - Attention: The Attention mechanism (originally introduced to improve LSTM-based seq-to-seq models) allowed the decoders of seq-to-seq models to focus on specific parts of input sequences, removing the main bottleneck of seq-to-seq which was the reliance on a single hidden state to communicate information between the encoder and decoder.
2017 - Transformer: It turned out, attention was so important and useful, that the rest of the seq-to-seq architecture, namely the LSTM, could even be thrown out. The famous paper "Attention is all you need" introduced the self-attention-based architecture, called Transformer, and it rapidly became the backbone of nearly all modern NLP and eventually multimodal AI systems.
2021 - Let's think step by step: In the early 2020s, large language models started to cannibalise NLP and also machine learning to some degree. Instead of using machine learning to solve problems, we used machine learning to create massive models, LLMs, which in turn could be used to solve problems by prompting. The focus shifted from improving machine learning techniques to improving the prompts. "Let's think step by step" is a famous example of this prompt engineering.
2022 - RLHF (Reinforcement Learning from Human Feedback): Enabled fine-tuning of language models like GPT-3.5 and ChatGPT to align with human intent. Practically, this means fine-tuning the models based on extra data from human reviewers of the models' responses.
2023 - RLAIF (Reinforcement Learning from AI Feedback): Improved alignment efficiency by leveraging model-generated feedback for fine-tuning large language models.

Timeline 3: Responsible AI and AI Safety

As deep learning - and machine learning in general - was more widely used, and often in situations where there were no guarantees they would work well, people started to focus on various ways these methods might fail users, cause harm, or may be misused.

2014 - Adversarial Examples: Szegedy et al. discovered adversarial examples, showing that small, imperceptible changes to input data could drastically alter model predictions, raising concerns about robustness.
2016 - Fairness - Equality of Opportunity: The "Equality of Opportunity in Supervised Learning" paper by Hardt et al. introduced algorithmic fairness by ensuring equalized error rates across groups.
2016 - Differentially Private SGD: Abadi et al. introduced techniques for training deep learning models with differential privacy, balancing privacy guarantees with utility for sensitive datasets.
2016 - GradCAM: Gradient-weighted Class Activation Mapping provided a visual explanation of deep learning models by highlighting important regions in input data.
2018 - Gender Shades: Joy Buolamwini and Timnit Gebru’s study revealed biases in commercial facial recognition systems, catalyzing fairness research.
2018–2023 - Mechanistic Interpretability: Pioneered by Anthropic’s interpretability research and OpenAI’s circuits work, aiming to reverse-engineer neural network reasoning by analyzing weights and activations.
2022 - Constitutional AI: Anthropic introduced Constitutional AI to align language models with human values through rule-based principles, avoiding harmful behaviors without direct human feedback.
2023 - Safety Benchmarks (e.g., HELM): The Holistic Evaluation of Language Models formalized evaluation protocols for fairness, bias, and alignment.

Timeline 4: Theory of Deep Learning

When it bursts into the scene in the early 2010s, deep learning defied most of our existing theory about what makes learning algorithms work. In particular, theoretical researchers predicted that deep learning, relying on extremely large parametric models and non-convex optimization, will fail to generalise, and for a while ignored mounting theoretical evidence to the contrary. Eventually, the community admitted that deep learning works, but also that we probably don't understand why. New theory needs to be developed. To this day, our theoretical understanding of deep learning's success lags behind applications.

2017 - Understanding Deep Learning Requires Rethinking Generalization: Zhang et al. showed that deep networks can memorize random labels yet still generalize well, challenging traditional theories.
2018 - Neural Tangent Kernel (NTK): Jacot et al. showed that deep networks in the infinite-width limit behave like kernel methods, providing insights into gradient descent dynamics.
2018 - Deep Linear Models: Saxe et al. studied deep linear networks, simplifying analysis while retaining key properties of deep models.
2019 - Tensor Programs: extended the analysis of infinitely wide neural networks, and yielded useful practical insights into hyperparameter tuning/transfer in large models.
2018–2020 - SGD Implicit Bias: Research showed that stochastic gradient descent (SGD) biases models toward solutions with favorable generalization properties.
2019 - Double Descent: Revealed that test error decreases as model capacity grows beyond a critical point, challenging the bias-variance trade-off.
2019 - Lottery Ticket Hypothesis: Frankle and Carbin found that large networks contain sparse sub-networks ("winning tickets") that match the performance of the full model.
2020 - Scaling Laws: OpenAI and DeepMind demonstrated how model performance scales predictably with size, data, and compute.
2021 - Recursive Training: Models can generate high-quality data for self-training, creating a virtuous cycle of performance improvement.
2021 - Grokking: Power et al. observed that models sometimes generalize long after fitting the training data, highlighting the interplay of optimization and regularization.
2022 - Transcendence (Beyond Memorization): A recent paper explored neural networks’ ability to extrapolate rules beyond training distributions.
2022 - Theory of In-context Learning in LLMs: Explored how large language models encode and utilize training distribution statistics to perform implicit meta-learning.
2023 - Extrapolation and Transcendence in LLMs: Research started to formalize how LLMs might generalize beyond training dataset by relying on a stronger form of compositional generalization. The term "Transcendence" has been recently coined to describe when language models are able to surpass the quality of the training data when making predictions.

Timeline 1: Breakthroughs in Applications

Timeline 2: Methods

Foundational Architectures (Vision and General Methods):

Generative Models:

Text Models:

Timeline 3: Responsible AI and AI Safety

Timeline 4: Theory of Deep Learning

Read more

DeepNN Notes on Inductive Biases of Neural Architectures

abc conjecture with rationals

AI Overview Session Plan

# Pytorch basics