LabA-Ch4 Interface Synthesis

# LabA-Ch4 Interface Synthesis ## Lab1-Block-Level I/O Protocols ### Code1 ```cpp #include "adders.h" int adders(int in1, int in2, int in3) { #pragma HLS latency min=1 max=1 // Prevent IO protocols on all input ports #pragma HLS INTERFACE ap_none port=in3 #pragma HLS INTERFACE ap_none port=in2 #pragma HLS INTERFACE ap_none port=in1 int sum; sum = in1 + in2 + in3; return sum; } ``` ### Syn Report1 ![image](https://hackmd.io/_uploads/rJ70d43Klx.png) ![image](https://hackmd.io/_uploads/SJ2fKEhKxg.png) :::info If the design takes more than 1 cycle to finished, `ap_clk` and `ap_rst` will be added. ::: ### Waveform ![image](https://hackmd.io/_uploads/Hkf7qVhteg.png) | Signal | Description | |----------|----------------------------------------------------------------| | ap_start | Starts the design; keep high until `ap_ready` is asserted. | | ap_ready | Indicates the design has finished input reads and is ready for new inputs. | | ap_done | Indicates the current operation/transaction is completed. | | ap_idle | High when the design is idle; low once operation begins. | ### Code2 ```cpp #include "adders.h" int adders(int in1, int in2, int in3) { #pragma HLS INTERFACE mode=ap_ctrl_none port=return #pragma HLS latency min=1 max=1 // Prevent IO protocols on all input ports #pragma HLS INTERFACE ap_none port=in3 #pragma HLS INTERFACE ap_none port=in2 #pragma HLS INTERFACE ap_none port=in1 int sum; sum = in1 + in2 + in3; return sum; } ``` * ap_ctrl_none = no ap protocal ### Syn Report2 ![image](https://hackmd.io/_uploads/ryFjJrhFel.png) ## Lab2-Port I/O Protocols ### Code ```cpp #include "adders_io.h" void adders_io(int in1, int in2, int *in_out1) { *in_out1 = in1 + in2 + *in_out1; } ``` ![image](https://hackmd.io/_uploads/ry74GIhKel.png) ### Syn Report ![image](https://hackmd.io/_uploads/B1Pj-82Kgx.png) ### Waveform ![image](https://hackmd.io/_uploads/HyFn6LnFel.png) ## Lab3-Implementing Arrays as RTL Interfaces ### Code1_Singel Port RAM ```cpp void array_io (dout_t d_o[N], din_t d_i[N]) { int i, rem; // Store accumulated data static dacc_t acc[CHANNELS]; dacc_t temp; // Accumulate each channel For_Loop: for (i=0;i<N;i++) { rem=i%CHANNELS; temp = acc[rem] + d_i[i]; acc[rem] = temp; d_o[i] = acc[rem]; } } ``` ### Syn Report1 ![image](https://hackmd.io/_uploads/HyDpnD3tex.png) | Port Name | Direction | Width | Interface | Group | Function | |----------------|-----------|-------|------------|-------|---------------------| | d_o_address0 | out | 5 | ap_memory | d_o | Address for write operation (32-depth memory). | | d_o_ce0 | out | 1 | ap_memory | d_o | Chip enable for write memory access. | | d_o_we0 | out | 1 | ap_memory | d_o | Write enable signal (1 = write, 0 = no write). | | d_o_d0 | out | 16 | ap_memory | d_o | Data to be written into memory. | | d_i_address0 | out | 5 | ap_memory | d_i | Address for read operation (32-depth memory). | | d_i_ce0 | out | 1 | ap_memory | d_i | Chip enable for read memory access. | | d_i_q0 | in | 16 | ap_memory | d_i | Data read from memory. | ### Waveform1 ![image](https://hackmd.io/_uploads/rkZewOpKgg.png) ### Code2_Dual Port RAM & FIFO Output ```cpp void array_io (dout_t d_o[N], din_t d_i[N]) { #pragma HLS RESOURCE variable=d_i core=RAM_2P int i, rem; // Store accumulated data static dacc_t acc[CHANNELS]; dacc_t temp; // Accumulate each channel For_Loop: for (i=0;i<N;i++) { rem=i%CHANNELS; temp = acc[rem] + d_i[i]; acc[rem] = temp; d_o[i] = acc[rem]; } } ``` ![image](https://hackmd.io/_uploads/SJP1qd3tgl.png) :::info If the loop is not unrolled, using Dual-port SRAM will have the same performance as the Single-port SRAM. ::: ### Syn Report2 ![image](https://hackmd.io/_uploads/Bkn3KO3tee.png) | Port Name | Dir | Width | Function | | ---------------- | --- | ----- | ---------------------------------------------------------------------------------------------------------- | | **`d_o_din`** | out | 16 | The 16-bit data that your design wants to push into the FIFO. (data to be written) | | **`d_o_full_n`** | in | 1 | FIFO status flag: `1` = not full (safe to write), `0` = FIFO is full (must stop writing). | | **`d_o_write`** | out | 1 | Write enable signal. Assert `1` when you want to push `d_o_din` into FIFO. Only valid when `d_o_full_n=1`. | ### Waveform2 ![image](https://hackmd.io/_uploads/SkmLtOTYee.png) ### Code3-Partitioned RAM & FIFO Array Interfaces :::info The same as Code2, but Directives are not. ::: ![image](https://hackmd.io/_uploads/S1bH0Optee.png) ### Syn Report3 ![image](https://hackmd.io/_uploads/SJe_AuaYel.png) Type for `ARRAY_PARTITION`: | Type | Partition Strategy | Example (8 elements, factor=2) | Pros | Cons | | -------- | ------------------------ | ------------------------------ | ------------------------ | -------------------- | | complete | Every element → register | 8 partitions (1 element each) | Max parallelism | Huge resource usage | | block | Continuous chunks | `[0–3], [4–7]` | Good for block access | Limited parallelism | | cyclic | Round-robin split | `[0,2,4,6], [1,3,5,7]` | Great for loop unrolling | More complex mapping | ### Waveform3 ![image](https://hackmd.io/_uploads/ry_EgYpFel.png) ![image](https://hackmd.io/_uploads/r1DSlFTYlx.png) ### Code4-Fully Partitioned Array Interfaces 1. If the array is partitioned into individual elements, it cannot be assigned to a block RAM 1. The code is same as Code1 ![image](https://hackmd.io/_uploads/Hk6oZtpYlg.png) ### Syn Report4 :::info complete = fully partitioned → each element becomes a separate register ::: ### Waveform4 ![image](https://hackmd.io/_uploads/B1_H7F6Fll.png) ### Comparison ![image](https://hackmd.io/_uploads/SJP0XKptlx.png) ## Lab4-Implementing AXI4 Interfaces ### Code1-Optimized for Lab3 ```cpp void axi_interfaces (dout_t d_o[N], din_t d_i[N]) { int i, rem; // Store accumulated data static dacc_t acc[CHANNELS]; // Accumulate each channel For_Loop: for (i=0;i<N;i++) { rem=i%CHANNELS; acc[rem] = acc[rem] + d_i[i]; d_o[i] = acc[rem]; } } ``` ![image](https://hackmd.io/_uploads/BkQEwYaFgx.png) * Without rewind → loop unrolling is divided into multiple batches. After each batch finishes, there is control overhead (reset/flush), which slightly reduces efficiency. * With rewind → the unrolled hardware resources keep running continuously and immediately process the next batch of data. This is especially suitable for streaming data or long loops. * for **Cylinc partition** (factor 2), we can read or write simultaneously to 2 banks when unrolling for_loop. ![image](https://hackmd.io/_uploads/HkMC0KaFeg.png) * for **Block partition** (factor 2), we can read or write only to 1 banks when unrolling for_loop -> **no parallel computation**. ![image](https://hackmd.io/_uploads/ryDUkcaYel.png) ### Syn Report and Compare ![image](https://hackmd.io/_uploads/B1mSpYpKxl.png) ![image](https://hackmd.io/_uploads/r1YO6Fptlx.png) * d_i_0~7, d_o_0~7