Beyond Self-Attention: External Attention Using Two Linear Layers for Visual Tasks

{%hackmd SybccZ6XD %} ###### tags: `paper` # Beyond Self-Attention: External Attention Using Two Linear Layers for Visual Tasks Goal > Decrease computational complexity. > > Mine potential relationships across the whole dataset. Self-attention > ![](https://hackmd.io/_uploads/ByyCygLHn.png) > ![](https://hackmd.io/_uploads/By8JelUr2.png) Simplified self-attention > ![](https://hackmd.io/_uploads/SyD7ggISh.png) > ![](https://hackmd.io/_uploads/rk7EggUrn.png) External-attention > ![](https://hackmd.io/_uploads/H1qrlxIrn.png) > ![](https://hackmd.io/_uploads/BJ5Ixe8rn.png) How to Mine potential relationships across the whole dataset? > self-attention: only considers the relation between elements within a data sample and ignores potential relationships between elements in different samples > > External-attention: the similarity between the ith feature and the jth row of M Ablation study on PASCAL VOC dataset > 單純換上EA，結果沒有比較好，還有另外調整Norm > ![](https://hackmd.io/_uploads/BkX1feLHn.png) Ablation Study on Different Multi-Head Mechanism on ImageNet Dataset > 從這裡可以看出來，parameter會增加 > ![](https://hackmd.io/_uploads/BJsdMxIBh.png) Experiment > image classification, object detection, semantic segmentation, instance segmentation, image generation, point cloud classification, and point cloud segmentation tasks