# HLS LabA Chapter4
## Lab 1
```cpp
#include "adders.h"
int adders(int in1, int in2, int in3) {
// Prevent IO protocols on all input ports
#pragma HLS INTERFACE ap_none port=in3
#pragma HLS INTERFACE ap_none port=in2
#pragma HLS INTERFACE ap_none port=in1
int sum;
sum = in1 + in2 + in3;
return sum;
}
```


```cpp
#include "adders.h"
int adders(int in1, int in2, int in3) {
#pragma HLS INTERFACE ap_ctrl_none port=return
// Prevent IO protocols on all input ports
#pragma HLS INTERFACE ap_none port=in3
#pragma HLS INTERFACE ap_none port=in2
#pragma HLS INTERFACE ap_none port=in1
int sum;
sum = in1 + in2 + in3;
return sum;
}
```

The difference between the two versions is whether #pragma HLS INTERFACE ap_ctrl_none port=return is applied:
Without it, Vitis HLS uses the default ap_ctrl_hs and automatically inserts the control handshake signals: ap_start, ap_done, ap_idle, and ap_ready.
With ap_ctrl_none, those ap_* control ports are not generated, and the block is treated as continuously running without block-level handshaking.
### Question
1. Show the default block-level, port-level protocol table

2. How to specify the block-level protocol?
Add pragma : `#pragma HLS INTERFACE **** port=return`
3. Show Interface table, and cross-reference signals and corresponding
protocol
| C | Protocol | RTL Signals |
| ----------------------- | ------------ | --------------------------------------------------------------- |
| `return` | `ap_ctrl_hs` | `ap_start`, `ap_done`, `ap_idle`, `ap_ready`, `ap_return[31:0]` |
| `in1` | `ap_none` | `in1[31:0]` |
| `in2` | `ap_none` | `in2[31:0]` |
| `in3` | `ap_none` | `in3[31:0]` |
4. Show co-simulation waveform, explain the ap_ctrl_hs interface behavior


When `ap_start` and `ap_ready` are both 1 on a rising edge, a transaction is accepted. `ap_done` goes high when that transaction completes (in this design, the same cycle due to zero-cycle latency). Because the block is fully combinational and has no in-flight work across cycles, `ap_idle` remains 1. The ability to accept new data every cycle is indicated by `ap_ready` staying high (II=1).
5. Use ap_ctrl_none -> Cosim failures

No Failure
## Lab 2
```tcl
############################################################
## This file is generated automatically by Vitis HLS.
## Please DO NOT edit it.
## Copyright 1986-2022 Xilinx, Inc. All Rights Reserved.
############################################################
set_directive_interface -mode ap_vld "adders_io" in1
set_directive_interface -mode ap_ack "adders_io" in2
set_directive_interface -mode ap_hs "adders_io" in_out1
```



note: `ap_hs` include `ap_vld` and `ap_ack`、 pointer argument is both an input and output to the function
### Question
1. List all the port-level protocol from Vitis HLS (2022.1) manual
ap_none/ap_vld/ap_ack/ap_hs/ap_ovld/ap_fifo/ap_memory/bram/axis/s_axilite/m_axi
2. Show the interface table & waveform to explain the signal behavior
`in1` (ap_vld): in1_ap_vld = 1 means the data on in1 is valid in that cycle, but it does not confirm whether the block actually consumed it.
`in2` (ap_ack): in2_ap_ack = 1 means the block accepted/consumed the data in that cycle. The driver should hold the input stable and only change it after seeing ap_ack.
`in_out1` (ap_hs): It has both vld and ack. Whether for input or output, a transfer completes only when vld and ack are both 1 on the clock edge.
## Lab 3
### Single-port RAM(Solution1)
```cpp
#include "array_io.h"
// The data comes in organized in a single array.
// - The first sample for the first channel (CHAN)
// - Then the first sample for the 2nd channel etc.
// The channels are accumulated independently
// E.g. For 8 channels:
// Array Order : 0 1 2 3 4 5 6 7 8 9 10 etc. 16 etc...
// Sample Order: A0 B0 C0 D0 E0 F0 G0 H0 A1 B1 C2 etc. A2 etc...
// Output Order: A0 B0 C0 D0 E0 F0 G0 H0 A0+A1 B0+B1 C0+C2 etc. A0+A1+A2 etc...
void array_io (dout_t d_o[N], din_t d_i[N]) {
int i, rem;
// Store accumulated data
static dacc_t acc[CHANNELS];
dacc_t temp;
// Accumulate each channel
For_Loop: for (i=0;i<N;i++) {
rem=i%CHANNELS;
temp = acc[rem] + d_i[i];
acc[rem] = temp;
d_o[i] = acc[rem];
}
}
```

### Dual-port RAM(Solution2)
```tcl
############################################################
## This file is generated automatically by Vitis HLS.
## Please DO NOT edit it.
## Copyright 1986-2022 Xilinx, Inc. All Rights Reserved.
############################################################
set_directive_top -name array_io "array_io"
set_directive_unroll "array_io/For_Loop"
set_directive_interface -mode bram -storage_impl bram -storage_type ram_2p "array_io" d_i
set_directive_interface -mode ap_fifo "array_io" d_o
```

### Array Partition(Solution3)
```tcl
############################################################
## This file is generated automatically by Vitis HLS.
## Please DO NOT edit it.
## Copyright 1986-2022 Xilinx, Inc. All Rights Reserved.
############################################################
set_directive_top -name array_io "array_io"
set_directive_unroll "array_io/For_Loop"
set_directive_array_partition -dim 1 -factor 2 -type block "array_io" d_i
set_directive_array_partition -dim 1 -factor 4 -type block "array_io" d_o
set_directive_interface -mode ap_fifo "array_io" d_o
set_directive_interface -mode bram -storage_type ram_2p -storage_impl bram "array_io" d_i
```

### Fully Partitioned Array Interfaces(Solution4)
```tcl
############################################################
## This file is generated automatically by Vitis HLS.
## Please DO NOT edit it.
## Copyright 1986-2022 Xilinx, Inc. All Rights Reserved.
############################################################
set_directive_top -name array_io "array_io"
set_directive_unroll "array_io/For_Loop"
set_directive_array_partition -dim 1 -type complete "array_io" d_i
set_directive_array_partition -dim 1 -type complete "array_io" d_o
set_directive_interface -mode ap_fifo "array_io" d_o
```

### Result Compare
#### Performance

#### Utilization

### Question
1.Rolled loop, use dual-port RAM. What does the synthesis report show?

In the figure, the d_i side shows extra q1, ce1, and address1 signals, indicating it supports dual-port reads
2.Unrolled the loop, compare the latency for the cases of combination of input(single/dual port), output (single/dual port), explain why?
Single/Dual => solution1/solution2
Although making `d_i` dual-port can improve the input-side Initiation Interval (II), the output `d_o` is implemented as a FIFO, so the achieved II stays the same—hence the overall latency is nearly unchanged between the two designs.

3. Array partition
• Unroll & array_partition with different type = block/cyclic/complete, factor = 2, 4
• Observe latency, resource used and explain why?
Solution 4/5/6/7/8 => complete/block2/block4/cyclic2/cyclic4

Latency: Complete < cyclic 4 < block 4 < cyclic 2 < block 2
Resource used: Complete < block 4 < block 2 < cyclic 2 ~= cyclic 4
**Latency**:
Complete: Everything is in registers, so there are no bank conflicts → smallest latency.
Cyclic4: Data are striped across banks (bank1: 0,4,8…; bank2: 1,5,9…; bank3: 2,6,10…; bank4: 3,7,11…). Our loop mostly accesses consecutive indices, so conflicts are rare → second-best latency.
Block4: Consecutive addresses fall into the same bank, so conflicts are more frequent than Cyclic4 → latency is slightly worse.
Cyclic2 / Block2: Fewer banks increase contention → worst latency among the options.
**Resource Usage**:
Complete: Implemented entirely with a small number of registers and minimal control logic, which is more efficient than using RAM in this case.
Block4 vs. Block2: Block4 uses fewer resources; Block2 sees more frequent conflicts and therefore needs more arbitration/scheduling logic.
Cyclic4 ≈ Cyclic2: Both need bank-selection (effectively a modulo-based mapping) and mux/demux control; with this small total size, that control overhead dominates, so their resource usage is about the same.
## Lab 4
### Cyclic Partition (Axis I/O)


### Axi4-lite Imp Addr
```cpp
// ==============================================================
// Vitis HLS - High-Level Synthesis from C, C++ and OpenCL v2022.1 (64-bit)
// Tool Version Limit: 2022.04
// Copyright 1986-2022 Xilinx, Inc. All Rights Reserved.
// ==============================================================
// control
// 0x0 : Control signals
// bit 0 - ap_start (Read/Write/COH)
// bit 1 - ap_done (Read/COR)
// bit 2 - ap_idle (Read)
// bit 3 - ap_ready (Read/COR)
// bit 7 - auto_restart (Read/Write)
// bit 9 - interrupt (Read)
// others - reserved
// 0x4 : Global Interrupt Enable Register
// bit 0 - Global Interrupt Enable (Read/Write)
// others - reserved
// 0x8 : IP Interrupt Enable Register (Read/Write)
// bit 0 - enable ap_done interrupt (Read/Write)
// bit 1 - enable ap_ready interrupt (Read/Write)
// others - reserved
// 0xc : IP Interrupt Status Register (Read/COR)
// bit 0 - ap_done (Read/COR)
// bit 1 - ap_ready (Read/COR)
// others - reserved
// (SC = Self Clear, COR = Clear on Read, TOW = Toggle on Write, COH = Clear on Handshake)
#define XAXI_INTERFACES_CONTROL_ADDR_AP_CTRL 0x0
#define XAXI_INTERFACES_CONTROL_ADDR_GIE 0x4
#define XAXI_INTERFACES_CONTROL_ADDR_IER 0x8
#define XAXI_INTERFACES_CONTROL_ADDR_ISR 0xc
```
### Question
1. Stream
• Unroll the loop, and observe how many axis channel created
• Compare the area with Lab1-3

48 axis channels created
Comparable to Lab 3 (Complete), smaller than Lab 3’s Cyclic and Block configurations, and much larger than in Lab 1 and Lab 2.
2. Axilite
It is used to communicate with hos program
Show _hw.h and explain its content
```cpp
// ==============================================================
// Vitis HLS - High-Level Synthesis from C, C++ and OpenCL v2022.1 (64-bit)
// Tool Version Limit: 2022.04
// Copyright 1986-2022 Xilinx, Inc. All Rights Reserved.
// ==============================================================
// control
// 0x0 : Control signals
// bit 0 - ap_start (Read/Write/COH)
// bit 1 - ap_done (Read/COR)
// bit 2 - ap_idle (Read)
// bit 3 - ap_ready (Read/COR)
// bit 7 - auto_restart (Read/Write)
// bit 9 - interrupt (Read)
// others - reserved
// 0x4 : Global Interrupt Enable Register
// bit 0 - Global Interrupt Enable (Read/Write)
// others - reserved
// 0x8 : IP Interrupt Enable Register (Read/Write)
// bit 0 - enable ap_done interrupt (Read/Write)
// bit 1 - enable ap_ready interrupt (Read/Write)
// others - reserved
// 0xc : IP Interrupt Status Register (Read/COR)
// bit 0 - ap_done (Read/COR)
// bit 1 - ap_ready (Read/COR)
// others - reserved
// (SC = Self Clear, COR = Clear on Read, TOW = Toggle on Write, COH = Clear on Handshake)
#define XAXI_INTERFACES_CONTROL_ADDR_AP_CTRL 0x0
#define XAXI_INTERFACES_CONTROL_ADDR_GIE 0x4
#define XAXI_INTERFACES_CONTROL_ADDR_IER 0x8
#define XAXI_INTERFACES_CONTROL_ADDR_ISR 0xc
```
This file provides the addresses of the relevant control registers and details the bit-field mapping