FLAT-Attention was public in Arxiv in July 2021 and published in ASPLOS March 2023. We are aware of concurrent work FlashAttention. In short, we are taking different route to tackle the same problem. The proposed solutions are different but the key idea is the same (tiling and scheudling). We summmarize the key difference in the following. To see the detail difference, please refer to our colab demo. Qualtively Comparisons Comparisons of FLAT-Attention and FlashAttention Tiling Strategy Comparisons The tiling strategy difference between FLAT-Attention and FlashAttention. FlashAttention uses block tiling and weight stationary. FLAT-Attention uses row tiling (row-granularity) and output stationary.
6/14/2023Felix Kao Up-to-date website Github, Linkedin Skills Proficient: Python, Pytorch, JAX, GCP, Cloud TPU, Verilog Experienced: Tensorflow, C/C++, Matlab Research Interest and Experience [ML] ML-based automation, RLs, GA-based optimization, Transformer, Efficient attention for long sequence, Pruning, Quantization, Neural architecture search
6/10/2022or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up