Try   HackMD

1. Vision Transformers (An image is worth 16x16 words)

  • Pretrained on 300+ million images
  • SOTA on ImageNet (88.55%)
  • Added CLS token to patch tokens => responsible for predicting true label
tags: vit