FLAT-Attention v.s. FlashAttention

FLAT-Attention was public in Arxiv in July 2021 and published in ASPLOS March 2023. We are aware of concurrent work FlashAttention. In short, we are taking different route to tackle the same problem. The proposed solutions are different but the key idea is the same (tiling and scheudling). We summmarize the key difference in the following. To see the detail difference, please refer to our colab demo.

Qualtively Comparisons

Comparisons of FLAT-Attention and FlashAttention

Tiling Strategy Comparisons

The tiling strategy difference between FLAT-Attention and FlashAttention. FlashAttention uses block tiling and weight stationary. FLAT-Attention uses row tiling (row-granularity) and output stationary.

FLAT-Attention v.s. FlashAttention

Qualtively Comparisons

Tiling Strategy Comparisons

Scheduling Strategy (Dataflow) Comparisons

FlashAttention

FLAT-Attention

Read more

FLAT-Attention

Sheng-Chun Kao