Lego Accelerator

# Lego Accelerator ###### tags: `Accelerator` ###### members: @李京叡 @周雨軒 ## Topic Reconfigurable data flow systolic array for multi-tenant neural newtorks ## Introduction * [Project Overview](/NzrfrYPXQzeW8SxKLrDBoA) * Multi-tenant neural network models often consists of multiple layers with varying properties and different data flow stationary policies reveal distinct results in the latency, PE utilizatoin, and energy consumption. It is necessary to improve systolic array architecture to meet the QoS while maximazing PE utiliztion for multi-tenant neural networks. ## Home Page * [實驗數據表格](https://docs.google.com/spreadsheets/d/1GepojUT6YWgCV0jzNWfvZTtXW3dvWobfE32EKUBp84g/edit?usp=sharing) * [Old Notes](/NzrfrYPXQzeW8SxKLrDBoA) * [Planaria simulator](https://github.com/he-actlab/planaria.code) ## Weekly Progress * [2022-05-20 ~ 2022-05-26](/KDOouuFkSpiAjmuXn_U-cg) * [2022-05-13 ~ 2022-05-19](/hD-NHuuaTA6FN_Txdk_tbw) * [2022-05-06 ~ 2022-05-12](/dy9hWfI_TKy00cCPzxOJyQ) * [2022-04-29 ~ 2022-05-05](/n-ey_3IlScaUNy90h83G4g) * [2022-04-22 ~ 2022-04-28](/uaQIM_KyRX6XhYgYIus-Zg) * [2022-04-15 ~ 2022-04-21](/_vAa9q0NQd-e9gBlFlusEA) * [2022-04-08 ~ 2022-04-14](/GmfOUXD0QfCtuCPyMGVR1g) * [2022-04-01 ~ 2022-04-07](/3FqZt-YcR1O0HJXjFVuw9Q) * [2022-03-25 ~ 2022-03-31](/-MU1-aRLSBOwH_GOlZCTdA) * [2022-03-18 ~ 2022-03-24](/W9A_6QPZQPe-Jb_NQmMrmg) * [2022-03-11 ~ 2022-03-17](/UyB1Kq14QJ-TJoWPTNu-ZA) * [2022-03-04 ~ 2022-03-10](/AiLMxoxNTjS4lpeFXClYxQ) * [2022-02-24 ~ 2022-03-02](/2ea_v386RWmC3rEIaZzv9A) * [2022-02-17 ~ 2022-02-24](/NuxOuDd-S2-X2QU1LC38XA) * [2022-01-27 ~ 2022-02-16](/0OUvvJLXS46NvdMUAutmwQ) * [2022-01-20 ~ 2022-01-26](/3hKDOgnLS_ao_S4fKsj47g) * [2022-01-13 ~ 2022-01-19](/nTC0KBEiQ5qaWklFOQ7liA) * [2022-01-06 ~ 2022-01-12](/9ytsEAWITf2es4qkD8MLzA) ## Questions * Why do we need to enable the systolic array to support multiple data flow ? * utilization vs latency vs energy vs QoS * What are advantages and disadvantages of data flow ? * R/W energy and latency of input/output/weight/row data flow * What are challenges to build a systolic array that can support multiple data flow ? * The routing of data path ? choice of data flow (scheduling)? local/global data buffer ? * What do we need to change in the systolic architecture to support multiple data flow ? * data path ? control path ? * How do hyper-parameters affect the choice of data flow ? * number of filter, number of channels, input/output size etc. * What are goals of multi-dataflow systolic array ? * QoS while maximizing PE utilization ## Reference * [Planaria](https://ieeexplore.ieee.org/abstract/document/9251939) * [SCALE-Sim](https://arxiv.org/abs/1811.02883) ## Evaluation [lego_double_buffer](https://docs.google.com/spreadsheets/d/1ivHBBZH8MqmSsJ7VKq3PHvSc1-wxG-V67B2uFbta9xo/edit?usp=sharing) [planaria_double_buffer](https://docs.google.com/spreadsheets/d/1CDijO0EL82zPHaWeHSGBaM6zmLBpZIkVx_Gabb0xNrg/edit?usp=sharing) [TPU_128_WS](https://docs.google.com/spreadsheets/d/1uVLVkE49xSYVVMf7hTk9izpbD2yc9vR9lHoCJPpNJFA/edit?usp=sharing) [TPU_128_MD](https://docs.google.com/spreadsheets/d/1n8Li2eJCshwrwKHuvmQ5gIKyRujqzYqjfzZzGXRCT2c/edit?usp=sharing) ## Algorithm [scheduler](https://hackmd.io/@S6T8LregRNCJp6gnpLpM-w/rJ0DZigHi) ## Paper Draft * [DAC 2023](/AaPqjeguQ8evxL77YYhzFQ) * [Lego paper](/z7MpfC-NTvCts5p2rWp8Rw) ## Reference Papers 1. [Planaria: Dynamic Architecture Fission for Spatial Multi-Tenant Acceleration of Deep Neural Networks](https://ieeexplore.ieee.org/abstract/document/9251939), MICRO 2020 2. [A domain-specific supercomputer for training deep neural networks](https://dl.acm.org/doi/pdf/10.1145/3360307), CACM, 2020 3. [In-Datacenter Performance Analysis of a Tensor Processing Unit](https://arxiv.org/abs/1704.04760),ISCA 2017 4. [MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable Interconnects](https://anands09.github.io/papers/maeri_asplos2018.pdf), ASPLOS 2018 5. [MAGMA: An Optimization Framework for Mapping Multiple DNNs on Multiple Accelerator Cores](https://arxiv.org/pdf/2104.13997.pdf), HPCA 2022 6. [A Multi-Neural Network Acceleration Architecture](https://ieeexplore.ieee.org/document/9138929), ISCA 2020 7. [A Configurable Cloud-Scale DNN Processor for Real-Time AI](https://ieeexplore.ieee.org/document/8416814), ISCA 2018 8. [PREMA: A Predictive Multi-task Scheduling Algorithm For Preemptible Neural Processing Units](https://arxiv.org/abs/1909.04548), HPCA 2019 9. [A Hybrid Systolic-Dataflow Architecture for Inductive Matrix Algorithms](https://ieeexplore.ieee.org/document/9065593), HPCA 2020 10. [FlexFlow: A Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks](https://ieeexplore.ieee.org/document/7920855), HPCA 2017 11. [Simba: Scaling Deep-Learning Inference with Multi-Chip-Module-Based Architecture](https://people.eecs.berkeley.edu/~ysshao/assets/papers/shao2019-micro.pdf), MICRO 2019 12. [Maximizing CNN Accelerator Efficiency Through Resource Partitioning](https://arxiv.org/abs/1607.00064), ISCA 2017 13. [Heterogeneous Dataflow Accelerators for Multi-DNN Workloads](https://arxiv.org/abs/1909.07437), HPCA 2021 14. [Understanding reuse, performance, and hardware cost of dnn dataflow: A data-centric approach](https://dl.acm.org/doi/10.1145/3352460.3358252), MICRO 2019 15. [mRNA: Enabling Efficient Mapping Space Exploration for a Reconfiguration Neural Accelerator](https://ieeexplore.ieee.org/document/8695674), ISPASS 2019 16. [Tangram:Optimized coarse-grained dataflow for scalable nn accelerators](https://web.stanford.edu/~mgao12/pubs/tangram.asplos19.pdf), ASPLOS 2019 17. [Capstan: A Vector RDA for Sparsity](https://dl.acm.org/doi/10.1145/3466752.3480047), MICRO 2021 18. [CoSA: Scheduling by Constrained Optimization for Spatial Accelerators](https://arxiv.org/abs/2105.01898), ISCA 2021 19. [SARA: Scaling a Reconfigurable Dataflow Accelerator](https://ieeexplore.ieee.org/document/9499943), ISCA 2021 20. [Think Fast: A Tensor Streaming Processor (TSP) for Accelerating Deep Learning Workloads](https://ieeexplore.ieee.org/document/9138986), ISCA 2020 21. [Deep convolutional neural network architecture with reconfigurable computation patterns](https://ieeexplore.ieee.org/document/7898402), IEEE VLSI Trans, 2017 22. [Kelp: QoS for Accelerated Machine Learning Systems](https://ieeexplore.ieee.org/document/8675247), HPCA 2019 23. [Wire-Aware Architecture and Dataflow for CNN Accelerators](https://dl.acm.org/doi/10.1145/3352460.3358316), MICRO 2019 24. [Tangram: Optimized Coarse-Grained Dataflow for Scalable NN Accelerators](https://web.stanford.edu/~mgao12/pubs/tangram.asplos19.pdf), ASPLOS 2019 25. [Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks](https://ieeexplore.ieee.org/document/7551407), ISCA 2016 26. [Configurable Multi-directional Systolic Array Architecture for Convolutional Neural Networks](https://dl.acm.org/doi/fullHtml/10.1145/3460776), TACO 2021 27. [A systematic methodology for characterizing scalability of DNN accelerators using SCALE-Sim](https://doi.org/10.1109/ISPASS48437.2020.00016), ISPASS 2018