Sheng-Chun Kao === ###### tags: `CV` ### Felix Kao [Up-to-date website](https://felix0901.github.io/felix.github.io) [Github](http://github.com/felix0901), [Linkedin](https://www.linkedin.com/in/sheng-chun-kao-felix0901/) ## Skills #### Proficient: Python, Pytorch, JAX, GCP, Cloud TPU, Verilog #### Experienced: Tensorflow, C/C++, Matlab ## Research Interest and Experience #### [ML] ML-based automation, RLs, GA-based optimization, Transformer, Efficient attention for long sequence, Pruning, Quantization, Neural architecture search #### [Accelerator] DNN accelerator, DNN Mapping/Dataflow, Algorithm-HW co-design ## Projects #### DNN Accelerator Design Space Exploration (DSE), GaTech, GA • Framed HW DSE into an RL problem and developed a REINFORCE-based algorithm • The proposed method outperforms 3 widely used optimization methods (SA, GA, BO) and 6 SOTA RLs • Main host of a popular open-sourced DNN accelerator cost model, [MAESTRO](http://maestro.ece.gatech.edu/), and a HW DSE framework, [ConfuciuX](https://github.com/maestro-project/confuciux) #### DNN Dataflow and Schedule Optimization, GaTech, GA • The proposed genetic algorithm (GA)-based DNN dataflow mapper outperforms other optimizers by 10x-100x • Developed and hosted an open-sourced DNN dataflow mapper ([GAMMA](https://github.com/maestro-project/gamma)) supporting two most popular open-sourced DNN accelerator cost models MAESTRO and Timeloop • Developed a scheduler for multi-tenant DNN workload for multi-core accelerator, MAGMA • Developed a generalizable Decision-Transformer-based DNN mapper, DNNFuser #### Efficient Attention and Transformers, GaTech, GA • Model pruning and efficient attention techniques for training large BERT model for long sequence tasks • Developed new dataflow tackling the quadratic memory bottleneck of attention layers #### SW/HW Co-design for DNN Accelerator and Machine Learnings, GaTech, GA • Developed an optimized neural-evolution platform with algorithm-HW co-design approach and implemented on FPGA board (PYNQ) • Developed a GAN-based HW-aware Neural Architecture Search technique • Developed three RL-based methods optimizing Network-on-Chip performance ## Work Experiences ### Intern, Google, CA 05/21' - 08/21' • Research project on speed quality trade-off of efficient Transformer model ### Intern, Corporation Technology, Siemens, NJ 05/19’ – 07/19’ • Developed Pytorch DNN, RNN quantization module, for fast evaluation of any quantized DNN models ## Publications 1. S.-C. Kao, H Kwon, M Pellauer, A Parashar, T Krishna,"A Formalism of DNN Accelerator Flexibility", ACM SIGMETRICS/Performance conference (SIGMETRICS), Jun 2022 2. S.-C. Kao, T Krishna, "MAGMA: An Optimization Framework for Mapping Multiple DNNs on Multiple Accelerator Cores", HPCA, 2022, [paper](https://arxiv.org/abs/2104.13997) 3. S.-C. Kao, M Pellauer, A Parashar, T Krishna, "DiGamma: Domain-aware Genetic Algorithm for Mapping-HW Co-optimization for DNN Accelerators", Design, Automation and Test in Europe Conference (DATE), March 2022, [paper](https://arxiv.org/abs/2201.11220) 4. S.-C. Kao, X. Huang, T Krishna, “DNNFuser: Generative Pre-Trained Transformer as a Generalized Mapper for Layer Fusion in DNN Accelerators ”, arXiv, Jan 2022, [paper](https://arxiv.org/abs/2201.11218) 5. S.-C. Kao, S. Subramanian, G. Agrawal, T Krishna, "FLAT: An Optimized Dataflow for Mitigating Attention Performance Bottlenecks", arXiv, July 2021, [paper](https://arxiv.org/abs/2107.06419) 6. S.-C. Kao, T Krishna, "E3: A HW/SW Co-design Neuroevolution Platform for Autonomous Learning in Edge Device", IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2021, [paper](https://cpb-us-w2.wpmucdn.com/sites.gatech.edu/dist/c/332/files/2021/04/e3-inax_ispass2021.pdf) 7. S.-C. Kao, T Krishna, “GAMMA: Automating the HW Mapping of DNN Models on Accelerators via Genetic Algorithm”, IEEE/ACM International Conference On Computer Aided Design (ICCAD), 2020, [paper](https://cpb-us-w2.wpmucdn.com/sites.gatech.edu/dist/c/332/files/2020/08/gamma_iccad2020.pdf), [code](https://github.com/maestro-project/gamma), [video](https://www.youtube.com/watch?v=gfBFRBbcA10) 8. S.-C. Kao, G Jeong, T Krishna, “ConfuciuX: Autonomous Hardware Resource Assignment for DNN Accelerators using Reinforcement Learning”, IEEE/ACM International Symposium on Microarchitecture (MICRO), 2020, [paper](https://arxiv.org/pdf/2009.02010.pdf), [code](https://github.com/maestro-project/confuciux), [video](https://www.youtube.com/watch?v=qHuO_38CdWQ) 9. S.-C. Kao, A. Ramamurthy, R. Williams, T Krishna, “Conditional Neural Architecture Search”, Resource-Constrained Machine Learning (ReCoML'20), March 2020, [paper](https://arxiv.org/abs/2006.03969) 10. S.-C. Kao, A. Ramamurthy, T Krishna, "Generative Design of Hardware-aware DNNs", arXiv, 2020, [paper](https://arxiv.org/abs/2006.03968) 11. S.-C. Kao, C.-H. Yang, P.-Y. Chen, X. Ma, T. Krishna, “Reinforcement Learning based Interconnection Routing for Adaptive Traffic Optimization”, IEEE/ACM International Symposium on Networks-on-Chip (NOCS), Oct. 2019, [paper](https://arxiv.org/abs/1908.04484), [code](https://github.com/felix0901/interconnect-routing-gym) 12. S.-C. Kao, D.-Y. Lee, A.-Y. Wu, “Bloom Filter And Implementation Method Thereof”, US Patent, 2020, [paper](https://patentscope.wipo.int/search/en/detail.jsf?docId=US277544115) 13. S.-C. Kao, D.-Y. Lee, A.-Y. Wu, “Dynamically Updatable Ternary Segmented Aging Bloom Filter for OpenFlow-Compliant Low-Power Packet Processing”, IEEE/ACM Transaction On Networking, March 2018, [paper](https://ieeexplore.ieee.org/document/8322446) 14. D.-Y. Lee, S.-C. Kao, A.Y. Wu, “Dynamically Updatable Mechanisms for OpenFlow-compliant Low-power Packet Processing”, Book chapter in “Advances in Networks: Security and Communications: Reviews”, 2019, [paper](https://www.amazon.com/Advances-Networks-Security-Communications-Reviews/dp/8409145103) 15. C.-c. Wang, Y.-T. Chen, D.-Y. Lee, S.-C. Kao, A.-Y. Wu, “Profiling and SW/HW Co-design for Efficient SDN/OpenFlow Data Plane Realization”, IEEE International Conference on Electronics Information and Emergency Communication (ICIEC), July 2017, [paper](https://ieeexplore.ieee.org/abstract/document/8076600?casa_token=LmO1pAYgoQAAAAAA:2Kw-Utns6LMoQOgM6EwT1jafpZ28OF8ZvHmr106mFyuMjQINZrWYijo4n_popQZdllC6-B70AXY) 16. E. Qin, G. Jeong, W. Won, S.-C. Kao, H. Kwon, S. Srinivasan, D. Das, G. Moon, S. Rajamanickam, T. Krishna, “Extending Sparse Tensor Accelerators to Support Multiple Compression Formats”, IEEE International Parallel & Distributed Processing Symposium (IPDPS), May 2021, [paper](https://arxiv.org/pdf/2103.10452.pdf) ## Education ### Georgia Institute of Technology, Atlanta GA, Ph.D. in Electrical and Computer Engineering in Dr. Tushar Krishna's group - [Synergy Lab](https://synergy.ece.gatech.edu/), GPA: 3.75/4.0 **Majoring**: Computer Architecture & Software Expt: May 2022 **Selected Coursework**: Machine Learning Hardware Accelerator, Advanced Computer Architecture, High Performance Parallel Computing, Advanced Machine Learning, Interconnection Network, Digital Image Processing ### National Taiwan University, Taipei Taiwan, B.S., M.S. in Electronics Engineering, in Dr. An-Yeu Wu's group - [Access Lab](http://access.ee.ntu.edu.tw/), GPA: 3.95/4.0 **Selected Coursework**: Convex Optimization, GPU Programming, Embedded System