FLAT-Attention v.s. FlashAttention
====
[FLAT-Attention](https://arxiv.org/abs/2107.06419) was public in Arxiv in July 2021 and published in ASPLOS March 2023. We are aware of concurrent work [FlashAttention](https://arxiv.org/abs/2205.14135). In short, we are taking different route to tackle the same problem. The proposed solutions are different but the key idea is the same (**tiling** and **scheudling**). We summmarize the key difference in the following. To see the detail difference, please refer to our colab demo.
### Qualtively Comparisons

*Comparisons of FLAT-Attention and FlashAttention*
### Tiling Strategy Comparisons

*The tiling strategy difference between FLAT-Attention and FlashAttention. FlashAttention uses block tiling and weight stationary. FLAT-Attention uses row tiling (row-granularity) and output stationary.*
### Scheduling Strategy (Dataflow) Comparisons
#### *FlashAttention*

--

----
#### *FLAT-Attention*

--
