Try   HackMD

FLAT-Attention v.s. FlashAttention

FLAT-Attention was public in Arxiv in July 2021 and published in ASPLOS March 2023. We are aware of concurrent work FlashAttention. In short, we are taking different route to tackle the same problem. The proposed solutions are different but the key idea is the same (tiling and scheudling). We summmarize the key difference in the following. To see the detail difference, please refer to our colab demo.

Qualtively Comparisons


Comparisons of FLAT-Attention and FlashAttention

Tiling Strategy Comparisons


The tiling strategy difference between FLAT-Attention and FlashAttention. FlashAttention uses block tiling and weight stationary. FLAT-Attention uses row tiling (row-granularity) and output stationary.

Scheduling Strategy (Dataflow) Comparisons

FlashAttention

FLAT-Attention