FLAT-Attention was public in Arxiv in July 2021 and published in ASPLOS March 2023. We are aware of concurrent work FlashAttention. In short, we are taking different route to tackle the same problem. The proposed solutions are different but the key idea is the same (tiling and scheudling). We summmarize the key difference in the following. To see the detail difference, please refer to our colab demo.
Comparisons of FLAT-Attention and FlashAttention
The tiling strategy difference between FLAT-Attention and FlashAttention. FlashAttention uses block tiling and weight stationary. FLAT-Attention uses row tiling (row-granularity) and output stationary.