# 1. Vision Transformers (An image is worth 16x16 words) - Pretrained on 300+ million images - SOTA on ImageNet (88.55%) - Added CLS token to patch tokens => responsible for predicting true label ###### tags: `vit`