# LabA-Ch4 Interface Synthesis
## Lab1-Block-Level I/O Protocols
### Code1
```cpp
#include "adders.h"
int adders(int in1, int in2, int in3) {
#pragma HLS latency min=1 max=1
// Prevent IO protocols on all input ports
#pragma HLS INTERFACE ap_none port=in3
#pragma HLS INTERFACE ap_none port=in2
#pragma HLS INTERFACE ap_none port=in1
int sum;
sum = in1 + in2 + in3;
return sum;
}
```
### Syn Report1


:::info
If the design takes more than 1 cycle to finished, `ap_clk` and `ap_rst` will be added.
:::
### Waveform

| Signal | Description |
|----------|----------------------------------------------------------------|
| ap_start | Starts the design; keep high until `ap_ready` is asserted. |
| ap_ready | Indicates the design has finished input reads and is ready for new inputs. |
| ap_done | Indicates the current operation/transaction is completed. |
| ap_idle | High when the design is idle; low once operation begins. |
### Code2
```cpp
#include "adders.h"
int adders(int in1, int in2, int in3) {
#pragma HLS INTERFACE mode=ap_ctrl_none port=return
#pragma HLS latency min=1 max=1
// Prevent IO protocols on all input ports
#pragma HLS INTERFACE ap_none port=in3
#pragma HLS INTERFACE ap_none port=in2
#pragma HLS INTERFACE ap_none port=in1
int sum;
sum = in1 + in2 + in3;
return sum;
}
```
* ap_ctrl_none = no ap protocal
### Syn Report2

## Lab2-Port I/O Protocols
### Code
```cpp
#include "adders_io.h"
void adders_io(int in1, int in2, int *in_out1) {
*in_out1 = in1 + in2 + *in_out1;
}
```

### Syn Report

### Waveform

## Lab3-Implementing Arrays as RTL Interfaces
### Code1_Singel Port RAM
```cpp
void array_io (dout_t d_o[N], din_t d_i[N]) {
int i, rem;
// Store accumulated data
static dacc_t acc[CHANNELS];
dacc_t temp;
// Accumulate each channel
For_Loop: for (i=0;i<N;i++) {
rem=i%CHANNELS;
temp = acc[rem] + d_i[i];
acc[rem] = temp;
d_o[i] = acc[rem];
}
}
```
### Syn Report1

| Port Name | Direction | Width | Interface | Group | Function |
|----------------|-----------|-------|------------|-------|---------------------|
| d_o_address0 | out | 5 | ap_memory | d_o | Address for write operation (32-depth memory). |
| d_o_ce0 | out | 1 | ap_memory | d_o | Chip enable for write memory access. |
| d_o_we0 | out | 1 | ap_memory | d_o | Write enable signal (1 = write, 0 = no write). |
| d_o_d0 | out | 16 | ap_memory | d_o | Data to be written into memory. |
| d_i_address0 | out | 5 | ap_memory | d_i | Address for read operation (32-depth memory). |
| d_i_ce0 | out | 1 | ap_memory | d_i | Chip enable for read memory access. |
| d_i_q0 | in | 16 | ap_memory | d_i | Data read from memory. |
### Waveform1

### Code2_Dual Port RAM & FIFO Output
```cpp
void array_io (dout_t d_o[N], din_t d_i[N]) {
#pragma HLS RESOURCE variable=d_i core=RAM_2P
int i, rem;
// Store accumulated data
static dacc_t acc[CHANNELS];
dacc_t temp;
// Accumulate each channel
For_Loop: for (i=0;i<N;i++) {
rem=i%CHANNELS;
temp = acc[rem] + d_i[i];
acc[rem] = temp;
d_o[i] = acc[rem];
}
}
```

:::info
If the loop is not unrolled, using Dual-port SRAM will have the same performance as the Single-port SRAM.
:::
### Syn Report2

| Port Name | Dir | Width | Function |
| ---------------- | --- | ----- | ---------------------------------------------------------------------------------------------------------- |
| **`d_o_din`** | out | 16 | The 16-bit data that your design wants to push into the FIFO. (data to be written) |
| **`d_o_full_n`** | in | 1 | FIFO status flag: `1` = not full (safe to write), `0` = FIFO is full (must stop writing). |
| **`d_o_write`** | out | 1 | Write enable signal. Assert `1` when you want to push `d_o_din` into FIFO. Only valid when `d_o_full_n=1`. |
### Waveform2

### Code3-Partitioned RAM & FIFO Array Interfaces
:::info
The same as Code2, but Directives are not.
:::

### Syn Report3

Type for `ARRAY_PARTITION`:
| Type | Partition Strategy | Example (8 elements, factor=2) | Pros | Cons |
| -------- | ------------------------ | ------------------------------ | ------------------------ | -------------------- |
| complete | Every element → register | 8 partitions (1 element each) | Max parallelism | Huge resource usage |
| block | Continuous chunks | `[0–3], [4–7]` | Good for block access | Limited parallelism |
| cyclic | Round-robin split | `[0,2,4,6], [1,3,5,7]` | Great for loop unrolling | More complex mapping |
### Waveform3


### Code4-Fully Partitioned Array Interfaces
1. If the array is partitioned into individual elements, it
cannot be assigned to a block RAM
1. The code is same as Code1

### Syn Report4
:::info
complete = fully partitioned → each element becomes a separate register
:::
### Waveform4

### Comparison

## Lab4-Implementing AXI4 Interfaces
### Code1-Optimized for Lab3
```cpp
void axi_interfaces (dout_t d_o[N], din_t d_i[N]) {
int i, rem;
// Store accumulated data
static dacc_t acc[CHANNELS];
// Accumulate each channel
For_Loop: for (i=0;i<N;i++) {
rem=i%CHANNELS;
acc[rem] = acc[rem] + d_i[i];
d_o[i] = acc[rem];
}
}
```

* Without rewind → loop unrolling is divided into multiple batches. After each batch finishes, there is control overhead (reset/flush), which slightly reduces efficiency.
* With rewind → the unrolled hardware resources keep running continuously and immediately process the next batch of data. This is especially suitable for streaming data or long loops.
* for **Cylinc partition** (factor 2), we can read or write simultaneously to 2 banks when unrolling for_loop.

* for **Block partition** (factor 2), we can read or write only to 1 banks when unrolling for_loop -> **no parallel computation**.

### Syn Report and Compare


* d_i_0~7, d_o_0~7