Flat-Attention v.s. FlashAttention FLAT: An Optimized Dataflow for Mitigating Attention Bottlenecks, ASPLOS'23 Summary of FLAT-Attention FLAT (Fused Logit and Attend Tiling) The quadratic complexity of Logit and Attend operator in Attention layer causing two major challenges: Low performance from memory boundedness Large on-chip buffer requirement for staging intermediate activations
6/14/2023Felix Kao Up-to-date website Github, Linkedin Skills Proficient: Python, Pytorch, JAX, GCP, Cloud TPU, Verilog Experienced: Tensorflow, C/C++, Matlab Research Interest and Experience [ML] ML-based automation, RLs, GA-based optimization, Transformer, Efficient attention for long sequence, Pruning, Quantization, Neural architecture search
6/10/2022or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up