--- title: permeet 0901 --- ### SOTA變化 目前ImageNet、Cifar10等資料分類中的模型幾乎看不見ResNet體系與ResNet體系,取而代之的是Transformer與Efficientnet,其中Transformer相關研究數量在去年直線增加 ## Transformer ### self-attention ![](https://i.imgur.com/SyP9dgp.png) ### attention in cv ![](https://i.imgur.com/1oeUqFc.png) ### multi-head attention 把原先的Q、K、V分成兩個獨立運算,最後concat在一起在去做降維 ![](https://i.imgur.com/QCvhVTo.png) ## CvT: Introducing Convolutions to Vision Transformers ### 概念: 將ConV融合transformer ![](https://i.imgur.com/Ufya79Q.png) ### convolutional token embedding 每個conv的結果為一個token,可以透過改變stride決定token數量 ![](https://i.imgur.com/1pOQgCU.png) ![](https://i.imgur.com/rrPYsK9.png) ### next step 1.multi-scal 2.convolution with transformer ### arrangement now - 10/15,2021 survey paper(multi-scle/fast transformer) 10/16 - 10/31,2021 build Vit & Cvt model 11/01 - 11/30,2021 data collection(Vit experiment) 12/01 - 12/31,2021 other transformer(fastformer/Pit)