# Socstudy
the best reference:https://docs.xilinx.com
## Use verilog to implement the AXI4-lite and AXI-stream
### Introduction of AXI4-lite
Here is a list of all signals of AXI_lite:
| 動作 | signal | 位寬|
| -------- | -------- | -------- |
|写地址 | AW_ADDR | ADDR_WIDTH-1 :0|
||AW_VALID
||AW_READY|
||AW_PORT| 1 : 0
|写数据| W_DATA| DATA_WIDTH-1 : 0 |
||W_STRB| |(DATA_WIDTH/8)-1 : 0|
||W_VALID|
||W_READY
|写回应| B_RESP| 1:0
||B_VALID
||B_READY
|读地址| AR_ADDR |ADDR_WIDTH-1 : 0
||AR_VALID
||AR_READY
||AR_PORT 1:0 读通道保护信号
|读数据| R_DATA
||R_RESP| 1:0
||R_VALID
||R_READY
#### Write transaction handshake example
Use write address channel, write data channel and write response channel.
1. On the write data channel, the host gives the data to be written 0x0F, and pulls the data valid signal WVALID high after one clock cycle, waiting for the slave's WREADY write ready signal to pull high;
2. On the write address channel, the host gives the write address 0x00 (the control register for the output data of GPIO channel 1), and pulls the address valid signal AWVALID high, waiting for the slave's AWREADY write ready signal to pull high;
3. When the WVALID and WREADY signals are pulled high at the same time, the data is successfully written to the GPIO slave; when the AWVALID and AWREADY signals are pulled high at the same time, the address is successfully written to the GPIO slave;
4. After one clock cycle, a response (BRESP is 0) is given on the write response channel, and the slave tells the host that the write has been successful;
In addition, for the write response channel, BREADY is actively given by the host and remains ready to receive responses. When the slave sends a BVALID signal to indicate a valid response, BREADY is pulled low for a period of time and then the response information is processed. After processing is completed, it returns to the state of being ready to receive responses.

The meaning of RRESP/BRESP read/write response signal, 2-bit signal, respectively represents successful reading and writing, exclusive reading and writing, slave device error, and decoding error.

#### Read transaction handshake example
First write data 0xFF to address 0x00, and then read it out, using the read address channel and read data channel.
1. On the read address channel, the host gives the address 0x00 to be read (the control register of the output data of GPIO channel 1), and pulls the address valid signal AWVALID high, waiting for the slave's AWREADY write ready signal to pull high , successfully writes the address to be read to the slave;
2. On the read data channel, the slave gives the read data 0xFF and pulls the data valid signal WVALID high. At this stage, the host's WREADY write ready signal is always pulled high, and the data is transmitted when both signals are high at the same time. , write 0xFF to the host. At this time, the read response RRESP[1:0] is 0, which means the writing is successful;
3. When the ARVALID and ARREADY signals are pulled high at the same time, the address is successfully written to the GPIO slave;
When the RVALID and RREADY signals are pulled high at the same time, the data is successfully returned to the ZYNQ host;

#### AXI4 bus handshake mechanism
The above example gives two situations in which Valid and Ready appear. One is that Valid is high first, and the other is that Ready is high first. In fact, there are three possible situations for Valid and Ready:
1. Valid is high first and then Ready is high.
the timing diagram is as follows, the transmission occurs where the arrow points (detected on the rising edge of clock ACLK and is high at the same time)

2. Ready is high first and then Valid is high.
the timing diagram is as follows, the transmission occurs where the arrow points (detected on the rising edge of clock ACLK and is high at the same time)

3. Ready and Valid are high at the same time.
the timing diagram is as follows, the transmission occurs where the arrow points (detected on the rising edge of clock ACLK and is high at the same time)

#### AXI4 bus channel dependency


### Verilog code of AXI4-Lite
there is a quickly way to produce the verilog code by vivado([reference](<https://xilinx.eetrend.com/blog/2023/100568047.html>))
- [AXI4_Lite_master code](<https://github.com/nthuyouwei/soclab/blob/main/lab_0/SOCStudy/axi_lite_master_v1_0_M00_AXI.v>)
- [AXI4_Lite_slave code](<https://github.com/nthuyouwei/soclab/blob/main/lab_0/SOCStudy/axi_lite_slave_v1_0_S00_AXI.v>)
There still are some way to implement.reference:
- [AXI4_Lite_master ](<https://zhuanlan.zhihu.com/p/550892140>)
- [AXI4_Lite_slave ](<https://zhuanlan.zhihu.com/p/550815975>)
### Introduction of AXI4-Stream
The signal interface of this protocol is as follows.

#### handshake mechanism
it can be divided into the following three situations:
1. The master is ready to send data before the slave

2. The master is ready to send data after the slave

3. The host and slave are ready to send/receive data at the same time

In streaming with the concept of frame or packet, the TLAST signal is used to indicate the end of a packet. For example, when sending a 32-byte packet, when sending the 32nd byte, the TLAST signal can be pulled high to indicate that the packet has been sent. The timing is as follows:

For data streams with no concept of packets or frames, the default value of TLAST is undefined. The following options are available:
- Set TLAST low. This shows that all transfers are in the same packet
- Set TLAST high. This indicates that all transmissions are separate packets
- Automatically generate pulse TLAST values. This option asserts TLAST after a fixed number of transfers, such as after two or 16 transfers
### Verilog code of AXI4-Stream
There is a quickly way to produce the verilog code by vivado([reference](<https://xilinx.eetrend.com/blog/2023/100568155.html>))
- [AXI4_Stream_master code](<https://github.com/nthuyouwei/soclab/blob/main/lab_0/SOCStudy/axi_stream_master_v1_0_M00_AXIS.v>)
- [AXI4_Stream_slave code](<https://github.com/nthuyouwei/soclab/blob/main/lab_0/SOCStudy/axi_stream_slave_v1_0_S00_AXIS.v>)
## Interfaces of hls design (note for [Vitis High-Level Synthesis User Guide (UG1399)](<https://docs.xilinx.com/r/en-US/ug1399-vitis-hls/Interfaces-of-the-HLS-Design>) and [kernel-IO-Interface](<https://eeclass.nthu.edu.tw/media/doc/98179>))
Interfaces can be manually assigned using the INTERFACE pragma or directive.
### Interfaces for Vitis Kernel(內核) Flow
The Vitis kernel flow implements the following interfaces by default(默認):
| C-argument type |Paradigm |interface protocol(I/O/Input)|
|-----|-----|--|
|Scalar(pass by value|Register| AXI4-Lite (s_axilite)|
Array |Memory| AXI4 Memory Mapped (m_axi)
Pointer to array| Memory| m_axi
Pointer to scalar| Register| s_axilite
Reference| Register| s_axilite
hls::stream |Stream| AXI4-Stream (axis)
**Details of M_AXI Interfaces:**
AXI4 memory-mapped (m_axi) interfaces allow kernels to read and write data in global memory (DDR, HBM, PLRAM), Memory-mapped interfaces are a convenient way of sharing data across different elements of the accelerated application.
**Details of S_AXILITE Interfaces:**
Since the host and kernel occupy two separate compute spaces in the Vitis kernel flow, the "stack" is managed by the AMD Run Time (XRT), and communication is managed through the s_axilite interface. The kernel is software controlled through XRT by reading and writing the control registers of an s_axilite interface as described in S_AXILITE Control Register Map. The interface provides the following features:
- Control Protocols
The block control protocol defines control registers in the s_axilite interface that let you set control signals to manage execution and operation of the kernel.
- Rules for Bundle
The Vitis kernel flow supports only a single s_axilite interface
- automatically creates a default bundle named Control.
- specify the bundle name, you must apply the same bundle to all s_axilite interfaces to create a single bundle.
**Details of AXIS Interfaces:**
The AXI4-Stream protocol (AXIS) defines a single uni-directional channel(單向通道) for streaming data in a sequential manner(按順序). The AXI4-Stream interfaces can burst an unlimited amount of data, which significantly improves performance. Unlike the AXI4 memory-mapped interface which needs an address to read/write the memory, the AXIS interface simply passes data to another AXIS interface without needing an address, and so uses fewer device resources.
The AXI4-Stream works on an industry-standard ready/valid handshake between a producer and consumer, as shown in the [figure](https://hackmd.io/_uploads/B1SiLI_lT.png) below. The data transfer is started once the producer sends the TVALID signal, and the consumer responds by sending the TREADY signal. This handshake of data and control should continue until either TREADY or TVALID are set low, or the producer asserts the TLAST signal indicating it is the last data packet of the transfer.
**Figure: AXI4-Stream Handshake**

**!! Important**: The AXIS interface can only be assigned to the top-level arguments (ports) of a kernel or IP, and cannot be assigned to the arguments of functions internal to the design. Streaming channels used inside the HLS design should use hls.
### interfaces for Vivado IP Flow
略
### Port-Level Protocols
The AXI4 interfaces supported by Vitis HLS include the AXI4-Stream interface (axis), AXI4-Lite (s_axilite), and AXI4 master (m_axi) interfaces. For a complete description of the AXI4 interfaces, see the [UG1037](<https://docs.xilinx.com/v/u/en-US/ug1037-vivado-axi-reference-guide>).
#### AXI4 Master Interface
In the Vitis Kernel flow it supports the following default features:
- Pointer and array argument (default to m_axi)
- Offest = slave, transfer address is defined by axilite.
- Default alignment is set to 64 bytes
- Maximum read/write burst length is set to 16 by default.
You can use an AXI4 master interface on array or pointer/reference arguments, which Vitis HLS implements in one of the following modes:
- With individual data transfers, Vitis HLS reads or writes a single element of data for each address.
- With burst mode transfers, Vitis HLS reads or writes data using a single base address followed by multiple sequential data samples, which makes this mode capable of higher data throughput. Burst mode of operation is possible when you use
- memcpy:
memcpy can not be inlined or pipelined, and can be problematic because it changes the type of the argument into char, which can lead to errors if array_partition, array_reshape, or struct disaggregate is used. Instead you are recommended to write your own version of memcpy with explicit arrays and loops to provide better control.
- for-loop + PIPELINE
When using a for loop to implement burst reads or writes, follow these requirements:
- Pipeline the loop
- Access addresses in increasing order
- Do not place accesses inside a conditional statement
- For nested loops, do not flatten loops, because this inhibits the burst operation
- Only one read and one write is allowed in a for loop unless the ports are bundled in different AXI ports.
The device resource consumption of the M_AXI adapter is a sum of all the write modules (size of the FIFO_wreq module, buff_wdata, and size of FIFO_ resp) and the sum of all read modules(FIFO_rreq, and buff_rdata). In general, the size of the FIFO is calculated as = Width * Depth.
Controlling AXI4 Burst Behavior
An optimal AXI4 interface is one in which the design never stalls while waiting to access the bus, and after bus access is granted, the bus never stalls while waiting for the design to read/write. To create the optimal AXI4 interface, the following options are provided in the INTERFACE pragma or directive to specify the behavior of the bursts and optimize the efficiency of the AXI4 interface. Refer to AXI Burst Transfers for more information on burst transfers.
Some of these options use internal storage to buffer data and may have an impact on area and resources:
- latency: Specifies the expected latency of the AXI4 interface, allowing the design to initiate a bus request a number of cycles (latency) before the read or write is expected. If this figure is too low, the design will be ready too soon and may stall waiting for the bus. If this figure is too high, bus access may be granted but the bus may stall waiting on the design to start the access.
- max_read_burst_length, max_write_burst_length: Specifies the maximum number of data values read/writtern during a burst transfer.
- num_read_outstanding, num_write_outstanding: Specifies how many read/write requests can be made to the AXI4 bus, without a response, before the design stalls. This implies internal storage in the design, a FIFO of size: num_read_outstanding*max_read_burst_length*word_size.
So, AXI burst performance affected by
- Latency
- Burst Length
- Num_outstanding
- Bus width
#### AXI4-Lite Interface
- Allow design controlled by a CPU
- Port return
In the Vitis kernel flow, the default block protocol is ap_ctrl_chain and is assigned to the s_axilite interface. However, in the Vivado IP flow the default block control protocol is ap_ctrl_hs and is assigned to its own interface
you can also assign the block control protocol to that interface using the following INTERFACE pragma, as an example:
```
#pragma HLS INTERFACE mode=s_axilite port=return
```
To change the block control protocol you can also use the INTERFACE pragma or directive as follows:
```
#pragma HLS INTERFACE mode=ap_ctrl_chain port=return
```
- Address 0x0000-0x000F for block I/O protocol and interrupt control
The Control Register Map generated by Vitis HLS for the ap_ctrl_chain block protocol is provided below:
```
//hls code
void example(char *a, char *b, char *c)
{
#pragma HLS INTERFACE mode=s_axilite port=return bundle=BUS_A
#pragma HLS INTERFACE mode=s_axilite port=a bundle=BUS_A
#pragma HLS INTERFACE mode=s_axilite port=b bundle=BUS_A
#pragma HLS INTERFACE mode=s_axilite port=c bundle=BUS_A
#pragma HLS INTERFACE mode=ap_vld port=b
*c += *a + *b;
}
```
```
//
==============================================================
// Vitis HLS - High-Level Synthesis from C, C++ and OpenCL v2023.1 (64-bit)
// Tool Version Limit: 2023.04
// Copyright 1986-2022 Xilinx, Inc. All Rights Reserved.
// Copyright 2022-2023 Advanced Micro Devices, Inc. All Rights Reserved.
//
// ==============================================================
//In the Control Register Map, HLS reserve address 0x00 - 0x0c for the block level protocols and interrupt controls.
...
//------------------------Address Info-------------------
// 0x00 : Control signals
// bit 0 - ap_start (Read/Write/COH)
// bit 1 - ap_done (Read)
// bit 2 - ap_idle (Read)
// bit 3 - ap_ready (Read/COR)
// bit 4 - ap_continue (Read/Write/SC)
// bit 7 - auto_restart (Read/Write)
// bit 9 - interrupt (Read)
// others - reserved
// 0x04 : Global Interrupt Enable Register
// bit 0 - Global Interrupt Enable (Read/Write)
// others - reserved
// 0x08 : IP Interrupt Enable Register (Read/Write)
// bit 0 - enable ap_done interrupt (Read/Write)
// bit 1 - enable ap_ready interrupt (Read/Write)
// others - reserved
// 0x0c : IP Interrupt Status Register (Read/TOW)
// bit 0 - ap_done (Read/TOW)
// bit 1 - ap_ready (Read/TOW)
// others - reserved
// 0x10 : Data signal of a
// bit 7~0 - a[7:0] (Read/Write)
// others - reserved
// 0x14 : reserved
// 0x18 : Data signal of b
// bit 7~0 - b[7:0] (Read/Write)
// others - reserved
// 0x1c : reserved
// 0x20 : Data signal of c_i
// bit 7~0 - c_i[7:0] (Read/Write)
// others - reserved
// 0x24 : reserved
// 0x28 : Data signal of c_o
// bit 7~0 - c_o[7:0] (Read)
// others - reserved
// 0x2c : reserved
// (SC = Self Clear, COR = Clear on Read, TOW = Toggle on Write, COH = Clear on Handshake)
```
- Controlling Hardware - hw.h
#### AXI4-Stream Interface
---
Important: hls::axis (and ap_axiu/ap_axis) cannot be used on internal functions or variables as the AXI4-Stream protocol is only supported on the interfaces of top-level functions. For internal functions or variables you must use hls::stream objects as described in [HLS Stream Library](<https://docs.xilinx.com/r/en-US/ug1399-vitis-hls/HLS-Stream-Library>).
---
An AXI4-Stream interface can be applied to any input argument and any array or pointer output argument. Because an AXI4-Stream interface transfers data in a sequential streaming manner, it cannot be used with arguments that are both read and written. In terms of data layout, the data type of the AXI4-Stream is aligned to the next byte. For example, if the size of the data type is 12 bits, it will be extended to 16 bits. Depending on whether a signed/unsigned interface is selected, the extended bits are either sign-extended or zero-extended.
**How AXI4-Stream is Implemented**
if your design requires a streaming interface begin by defining and using a streaming data structure like hls::stream in Vitis HLS. This simple object encapsulates the requirements of streaming and its streaming interface is by default implemented in the RTL as a FIFO interface (ap_fifo) but can be optionally, implemented as a handshake interface (ap_hs) or an AXI4-Stream interface (axis).
If a AXI4-Stream interface (axis) is specified via the interface pragma mode option, the interface implementation will mimic the style of an AXIS interface by defining the TDATA, TVALID and TREADY signals.
If a more formal AXIS implementation is desired, then Vitis HLS requires the usage of a special data type (hls::axis defined in ap_axi_sdata.h) to encapsulate the requirements of the AXI4-Stream protocol and implement the special RTL signals needed for this interface.
The AXI4-Stream interface is implemented as a struct type in Vitis HLS and has the following signature (defined in ap_axi_sdata.h):
```
template <typename T, size_t WUser, size_t WId, size_t WDest> struct axis { .. };
```
T:The data type to be streamed.
WUser:Width of the TUSER signal
WId:Width of the TID signal
WDest:Width of the TDest signal
When the stream data type (T) are simple integer types, there are two predefined types of AXI4-Stream implementations available:
- A signed implementation of the AXI4-Stream class
```
ap_axis<Wdata, WUser, WId, WDest>
hls::axis<ap_int<WData>, WUser, WId, WDest>
```
- A unsigned implementation of the AXI4-Stream class
```
ap_axiu<WData, WUser, WId, WDest>
hls::axis<ap_uint<WData>, WUser, WId, WDest>
```
The value specified for the WUser, WId, and WDest template parameters controls the usage of side-channel signals in the AXI4-Stream interface.
When the hls::axis class is used, the generated RTL will typically contain the actual data signal TDATA, and the following additional signals: TVALID, TREADY, TKEEP, TSTRB, TLAST, TUSER, TID, and TDEST.
TVALID, TREADY, and TLAST are necessary control signals for the AXI4-Stream protocol. TKEEP, TSTRB, TUSER, TID, and TDEST signals are optional special signals that can be used to pass around additional bookkeeping data.
**more detail of hls::stream<>**
- include <hls_stream.h>
- hls::stream<Type, Depth>
- Type
- C++ native data type
- HLS arbitary precision type, e.g ap_int<>
- User-defined struct containing above types
- Depth: depth of FIFO for co-simulation verification
- Used for top-level function arguments, and betwwn functions
- Top interface can be
- FIFO interface: ap_fifo (default) -support non-blocking behavior
- Handshake interface: ap_hs
- AXI4-Stream: axis
- inside function -FIFO with depth = 2(#pragma HLS STREAM depth = \<int\> ), note: depth specify actual resource allocation.
- Use passed-by-reference to pass streams into and out of functions
- Only in C++ based designs
### Other Port-Level Protocols for Vivado IP Flow
略
### Block-Level Control Protocols
The execution mode of a Vitis kernel or Vivado IP is defined by the block-level control protocol and the structure of sub-functions within the HLS design. Host application/driver controls kernel functions.
You can specify the block-level control protocol on the function return using the INTERFACE pragma or directive.
There are three kinds of block-level protocol
- ap_ctrl_hs(default:Sequential Mode)
Host and Kernel Synchronization by
- ap_start
- ap_done

Kernel can only be restarted(ap_start), after it completes the current execution(ap_done). Serving one execution request a time.
ap_ctrl_hs protocol
- ap_start (i): set 1 to start until ap_ready asserted.
- ap_ready (o): design is ready to accept new input
- ap_done (o): design completes all operation. Indicates data on ap_return is valid
- ap_idle (o): indicate design is idle if high.
- (ap_return): return data
- Pipeline/Nonpipeline depends on ap_ready timing

However, the behavior of ap_ctrl_hs interface same as ap_ctrl_chain, but it doesn't have the signal ap_cotinuous(smae as seting the ap_continue signal to 1 )
- ap_ctrl_chain(Pipeline Mode)
Host and Kernel Synchronization by
- ap_start
- ap_ready
- ap_done
- ap_continue
The two processes(Input Synchronization, Output Synchronization) run asynchronously.

ap_ctrl_hs protocol
- ap_start (i): Asserted when the kernel can start processing data. Cleared on handshake with ap_done being asserted.
- ap_done (o): Asserted when the kernel has completed operation. Cleared on read.
- ap_idel (o): Asserted when the kernel is idle.
- ap_ready (o): Asserted by the kernel when it is ready to accept the new data.
- ap_continue (i): Asserted by the XRT to allow kernel keep running.
- (ap_return): return data
**Figure: Behavior of ap_ctrl_chain Interface**

The timing diagram displays the following behavior after reset occurs:
1. The block waits for ap_start to go High before it begins operation.
2. Output ap_idle goes Low immediately to indicate the design is no longer idle.
3. The ap_start signal must remain High until ap_ready goes High. Once ap_ready goes High:
- If ap_start remains High the design will start the next transaction.
- If ap_start is taken Low, the design will complete the current transaction and halt operation.
4. Data can be read on the input ports.
5. Data can be written to the output ports
6. Output ap_done goes High when the block completes operation
7. The ap_ctrl_chain control protocol provides an active-High ap_continue signal that indicates when the downstream block that consumes the output data is ready for new data inputs. This allows the downstream block to provide back-pressure to prevent the flow of data.
- If the ap_continue signal is High when ap_done is High, the design continues operating.
- If the downstream block is not able to consume new data inputs, the ap_continue signal is Low. If the ap_continue signal is Low when ap_done is High, the design stops operating, the ap_done signal remains High waiting for ap_continue to go High.
8. When the design is ready to accept new inputs, the ap_ready signal is pulsed High for one clock cycle. The ap_ready port of a downstream block can directly drive the ap_continue port. Following is additional information about the ap_ready signal:
- The ap_ready signal is inactive until the design starts operation.
- In non-pipelined designs, the ap_ready signal is asserted at the same time as ap_done.
- In pipelined designs, the ap_ready signal might go High at any cycle after ap_start is sampled High. This depends on how the design is pipelined.
- If ap_start remains high after ap_ready goes high, the next transaction starts immediately.
- When ap_start goes low right after ap_ready goes high, the design keeps executing until ap_done is high and then stops operation, unless ap_start goes high again in the meantime, which starts a new transaction.
9. The ap_idle signal indicates when the design is idle and not operating. Following is additional information about the ap_idle signal:
- If the ap_start signal is Low when ap_ready is High, the design stops operation, and the ap_idle signal goes High one cycle after ap_done.
- If the ap_start signal is High when ap_ready is High, the design continues to operate, and the ap_idle signal remains Low.
- ap_ctrl_none(Free-runnung Mode)
Host ccommunucates with kernel through stream - Kernel starts execution when the data is available at its input.
ap_ctrl_none also has the same signals as ap_ctrl_chain, but the handshake signal ports (ap_start, ap_idle, ap_ready, and ap_done) are set high and optimized away.