Sheng-Chun Kao
Felix Kao
Up-to-date website
Github, Linkedin
Skills
Proficient: Python, Pytorch, JAX, GCP, Cloud TPU, Verilog
Experienced: Tensorflow, C/C++, Matlab
Research Interest and Experience
[Accelerator] DNN accelerator, DNN Mapping/Dataflow, Algorithm-HW co-design
Projects
DNN Accelerator Design Space Exploration (DSE), GaTech, GA
• Framed HW DSE into an RL problem and developed a REINFORCE-based algorithm
• The proposed method outperforms 3 widely used optimization methods (SA, GA, BO) and 6 SOTA RLs
• Main host of a popular open-sourced DNN accelerator cost model, MAESTRO, and a HW DSE framework, ConfuciuX
DNN Dataflow and Schedule Optimization, GaTech, GA
• The proposed genetic algorithm (GA)-based DNN dataflow mapper outperforms other optimizers by 10x-100x
• Developed and hosted an open-sourced DNN dataflow mapper (GAMMA) supporting two most popular open-sourced DNN accelerator cost models MAESTRO and Timeloop
• Developed a scheduler for multi-tenant DNN workload for multi-core accelerator, MAGMA
• Developed a generalizable Decision-Transformer-based DNN mapper, DNNFuser
• Model pruning and efficient attention techniques for training large BERT model for long sequence tasks
• Developed new dataflow tackling the quadratic memory bottleneck of attention layers
SW/HW Co-design for DNN Accelerator and Machine Learnings, GaTech, GA
• Developed an optimized neural-evolution platform with algorithm-HW co-design approach and implemented on FPGA board (PYNQ)
• Developed a GAN-based HW-aware Neural Architecture Search technique
• Developed three RL-based methods optimizing Network-on-Chip performance
Work Experiences
Intern, Google, CA 05/21' - 08/21'
• Research project on speed quality trade-off of efficient Transformer model
Intern, Corporation Technology, Siemens, NJ 05/19’ – 07/19’
• Developed Pytorch DNN, RNN quantization module, for fast evaluation of any quantized DNN models
Publications
- S.-C. Kao, H Kwon, M Pellauer, A Parashar, T Krishna,"A Formalism of DNN Accelerator Flexibility", ACM SIGMETRICS/Performance conference (SIGMETRICS), Jun 2022
- S.-C. Kao, T Krishna, "MAGMA: An Optimization Framework for Mapping Multiple DNNs on Multiple Accelerator Cores", HPCA, 2022, paper
- S.-C. Kao, M Pellauer, A Parashar, T Krishna, "DiGamma: Domain-aware Genetic Algorithm for Mapping-HW Co-optimization for DNN Accelerators", Design, Automation and Test in Europe Conference (DATE), March 2022, paper
- S.-C. Kao, X. Huang, T Krishna, “DNNFuser: Generative Pre-Trained Transformer as a Generalized Mapper for Layer Fusion in DNN Accelerators ”, arXiv, Jan 2022, paper
- S.-C. Kao, S. Subramanian, G. Agrawal, T Krishna, "FLAT: An Optimized Dataflow for Mitigating Attention Performance Bottlenecks", arXiv, July 2021, paper
- S.-C. Kao, T Krishna, "E3: A HW/SW Co-design Neuroevolution Platform for Autonomous Learning in Edge Device", IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2021, paper
- S.-C. Kao, T Krishna, “GAMMA: Automating the HW Mapping of DNN Models on Accelerators via Genetic Algorithm”, IEEE/ACM International Conference On Computer Aided Design (ICCAD), 2020, paper, code, video
- S.-C. Kao, G Jeong, T Krishna, “ConfuciuX: Autonomous Hardware Resource Assignment for DNN Accelerators using Reinforcement Learning”, IEEE/ACM International Symposium on Microarchitecture (MICRO), 2020, paper, code, video
- S.-C. Kao, A. Ramamurthy, R. Williams, T Krishna, “Conditional Neural Architecture Search”, Resource-Constrained Machine Learning (ReCoML'20), March 2020, paper
- S.-C. Kao, A. Ramamurthy, T Krishna, "Generative Design of Hardware-aware DNNs", arXiv, 2020, paper
- S.-C. Kao, C.-H. Yang, P.-Y. Chen, X. Ma, T. Krishna, “Reinforcement Learning based Interconnection Routing for Adaptive Traffic Optimization”, IEEE/ACM International Symposium on Networks-on-Chip (NOCS), Oct. 2019, paper, code
- S.-C. Kao, D.-Y. Lee, A.-Y. Wu, “Bloom Filter And Implementation Method Thereof”, US Patent, 2020, paper
- S.-C. Kao, D.-Y. Lee, A.-Y. Wu, “Dynamically Updatable Ternary Segmented Aging Bloom Filter for OpenFlow-Compliant Low-Power Packet Processing”, IEEE/ACM Transaction On Networking, March 2018, paper
- D.-Y. Lee, S.-C. Kao, A.Y. Wu, “Dynamically Updatable Mechanisms for OpenFlow-compliant Low-power Packet Processing”, Book chapter in “Advances in Networks: Security and Communications: Reviews”, 2019, paper
- C.-c. Wang, Y.-T. Chen, D.-Y. Lee, S.-C. Kao, A.-Y. Wu, “Profiling and SW/HW Co-design for Efficient SDN/OpenFlow Data Plane Realization”, IEEE International Conference on Electronics Information and Emergency Communication (ICIEC), July 2017, paper
- E. Qin, G. Jeong, W. Won, S.-C. Kao, H. Kwon, S. Srinivasan, D. Das, G. Moon, S. Rajamanickam, T. Krishna, “Extending Sparse Tensor Accelerators to Support Multiple Compression Formats”, IEEE International Parallel & Distributed Processing Symposium (IPDPS), May 2021, paper
Education
Georgia Institute of Technology, Atlanta GA, Ph.D. in Electrical and Computer Engineering in Dr. Tushar Krishna's group - Synergy Lab, GPA: 3.75/4.0
Majoring: Computer Architecture & Software Expt: May 2022
Selected Coursework: Machine Learning Hardware Accelerator, Advanced Computer Architecture, High Performance Parallel Computing, Advanced Machine Learning, Interconnection Network, Digital Image Processing
National Taiwan University, Taipei Taiwan, B.S., M.S. in Electronics Engineering, in Dr. An-Yeu Wu's group - Access Lab, GPA: 3.95/4.0
Selected Coursework: Convex Optimization, GPU Programming, Embedded System