FLAT-Attention v.s. FlashAttention

FLAT-Attention v.s. FlashAttention ==== [FLAT-Attention](https://arxiv.org/abs/2107.06419) was public in Arxiv in July 2021 and published in ASPLOS March 2023. We are aware of concurrent work [FlashAttention](https://arxiv.org/abs/2205.14135). In short, we are taking different route to tackle the same problem. The proposed solutions are different but the key idea is the same (**tiling** and **scheudling**). We summmarize the key difference in the following. To see the detail difference, please refer to our colab demo. ### Qualtively Comparisons ![](https://hackmd.io/_uploads/rkQU5jRHh.png) *Comparisons of FLAT-Attention and FlashAttention* ### Tiling Strategy Comparisons ![](https://hackmd.io/_uploads/HyirTjABh.png) *The tiling strategy difference between FLAT-Attention and FlashAttention. FlashAttention uses block tiling and weight stationary. FLAT-Attention uses row tiling (row-granularity) and output stationary.* ### Scheduling Strategy (Dataflow) Comparisons #### *FlashAttention* ![](https://hackmd.io/_uploads/S1geyhArn.png) -- ![](https://hackmd.io/_uploads/H1Fxy3AH3.png) ---- #### *FLAT-Attention* ![](https://hackmd.io/_uploads/BkXMy2RBn.png) -- ![](https://hackmd.io/_uploads/SyRSk20rh.png)

Read more

FLAT-Attention

Sheng-Chun Kao