HeteroCL: A Multi-Paradigm Programming Infrastructure for Software-Defined Reconfigurable Computing

# HeteroCL: A Multi-Paradigm Programming Infrastructure for Software-Defined Reconfigurable Computing ##### link: [Paper](https://vast.cs.ucla.edu/~chiyuze/pub/fpga19-heterocl.pdf), [Github](https://github.com/cornell-zhang/heterocl), [Website](https://heterocl.csl.cornell.edu/web/index.html), [Docs](https://cornell-zhang.github.io/heterocl/index.html) ###### paper origin: FPGA '19 - Problem: It is difficut to Write hardware by **hardware description language** (verilog, VHDL) due to too many specialized hardware knowledge. - Proposal: HeteroCL, a programming infrastructure composed of a ==Python-based domain-specific language== (DSL) and an FPGA-targeted compilation flow (CPU+FPGA). ## Introduction Heterogeneous computing platforms are becoming widely available such as CPU with GPU or FPGAs. FPGAs is especially difficult to be programmed. As a result, the use of such platforms has been limited to a small subset of programmers with ==specialized hardware knowledge==. HeteroCL, a ==programming infrastructure== composed of a ==Python-based== domain-specific language (DSL) and an ==FPGA-targeted compilation flow== (CPU+FPGA). HeteroCL framework produces highly efficient hardware implementations such as **systolic arrays** and **stencil with dataflow architectures**. The HeteroCL DSL provides a clean programming abstraction that **decouples** algorithm specification from three important types of hardware customization in 1. compute, 2. data types, and 3. memory architectures. HeteroCL is a Python-based DSL extended from ==TVM==, and incorporats state-of-the-art ==HLS optimizations==: ==PolySA== for systolic arrays and ==SODA== for stencil with dataflow architectures. And ==Merlin compiler== is as one of back-end tools. This compiler generates ==LLVM code on CPUs== and ==HLS code for FPGA== targets. ![heteroCL framework](https://hackmd.io/_uploads/rJ4b4d3L3.png) ### Why choose TVM? 1. Python-based DSL provides programmers with a rich set of productive language features such as introspection and dynamic type system. 2. TVM is a tensor-oriented declarative DSL. 3. TVM inherits the idea of decoupling the algorithm specification from the temporal schedule, which is first proposed by Halide ### Compute Customization performing loop transformations and executing the computation in parallel. Table 1 lists compute customization primitives currently supported by HeteroCL. The primitives prevent programmers from using ==vendor-specific pragmas==, which makes HeteroCL programs **portable** to different back ends. ![heteroCL_table1](https://hackmd.io/_uploads/SJh4F_2Uh.png) ### Data Type Customization **Quantized computation** using low-bitwidth integers and/or ==fixed-point types== is an essential technique to achieve efficient execution on FPGAs. ![heteroCL_table2](https://hackmd.io/_uploads/H1uU9dnU2.png) ![heteroCL_table3](https://hackmd.io/_uploads/B1q93_n8n.png) ### Memory Customization Accelerating applications on FPGAs usually requires a high on-chip memory bandwidth to match the throughput of massively parallel compute units. ![heteroCL_table4](https://hackmd.io/_uploads/H1P86_nUn.png) Example: reuse_at function with CNN: ![heteroCL_figure9b](https://hackmd.io/_uploads/ByiT59n83.png) ### Mapping to Spatial Architecture Templates ![heteroCL_table5](https://hackmd.io/_uploads/SyWZxtnIn.png) ## Back-End code Generation and Optimization - General Back End ![heteroCL_table8](https://hackmd.io/_uploads/BkFpMY2Un.png) ### Why choose Merlin compiler? 1. Merlin compiler leverages a small set of OpenMP-like pragmas to apply certain architecture structures by source-to-source C code transformation. 2. Merlin compiler generates both HLS C kernels and OpenCL kernels for FPGAs. - Stencil Back End - Systolic Array Back End ## Evaluation ![heteroCL_table7](https://hackmd.io/_uploads/H1wTwchI2.png) ![heteroCL_table9_10](https://hackmd.io/_uploads/rkp1bcn8n.png) ![heteroCL_table11](https://hackmd.io/_uploads/Byk-Nj282.png) ## Related Work - TVM is a **Python-based DSL** and a ==deep learning compiler== that enables access to high-performance machine learning anywhere. TVM significantly improves code portability across different CPU and GPU architectures[^1]. ![TVM_diagram](https://raw.githubusercontent.com/apache/tvm-site/main/images/tutorial/overview.png) - HLS (High-level synthesis) HLS referred to as **C synthesis, electronic system-level (ESL) synthesis, algorithmic synthesis, or behavioral synthesis**, is an ==automated design process== that takes an abstract behavioral specification of a digital system and finds a ==register-transfer level (RTL)== structure that realizes the given behavior. - PolySA: Polyhedral-Based Systolic Array Auto-Compilation PolySA leverages the power of the **polyhedral model** to achieve the end-to-end compilation for systolic array architecture on FPGAs. PolySA is the first ==fully automated compilation framework== for generating high-performance ==systolic array architectures== on the **FPGA** leveraging recent advances in **high-level synthesis**. ![PolySA compilation framework](https://hackmd.io/_uploads/S10hUw2Lh.png) - SODA (SODA Open Data Autonomy) SODA Architecture is getting evolved to realize a challenging goal of building a unified framework for ==data and storage management==. - Merlin compiler Merlin Compiler takes **C/C++ code** as an input and generates an executable that includes the **CPU host-code & the FPGA bitstream**. ![merlin-diagram](https://raw.githubusercontent.com/falconcomputing/merlin-compiler/master/images/merlin-diagram.png) - Halide Halide is a ==programming language== designed to make it easier to write high-performance ==image and array processing code== on modern machines. ==Halide is embedded in C++.== [^1]: [TVM introduction](https://tvm.apache.org/docs/tutorial/introduction.html#sphx-glr-tutorial-introduction-py)