# PYNQ Tutorial 2: DMA and Custom IP
## Objective
After you complete this tutorial, you should be able to:
- Understand how Xilinx AXI DMA works.
- Create a custom RTL design that accesses DMA.
## Source Code
This repository contains all of the code required in order to follow this tutorial: https://github.com/yohanes-erwin/pemrograman_zynq/tree/main/pynq_part_2
## 1. Introduction
In the previous tutorial, we have learned the simple design using AXI GPIO.

In this tutorial, we are going to create a system that consists of Zynq PS and PL design. The PL design is a simple processing element (PE) module that does multiply and add. The result of this design is that we can give input to the PE module in PL from the Jupyter Notebook.
**How to add custom RTL design to the system?**
This is the block diagram of the PE module. It is a multiply and add operation.

This is the Verilog implementation of the PE module.
```Verilog=
module pe
#(
parameter WIDTH = 8,
parameter FRAC_BIT = 0
)
(
input wire signed [WIDTH-1:0] a_in,
input wire signed [WIDTH-1:0] y_in,
input wire signed [WIDTH-1:0] b,
output wire signed [WIDTH-1:0] a_out,
output wire signed [WIDTH-1:0] y_out
);
wire signed [WIDTH*2-1:0] y_out_i;
assign a_out = a_in;
assign y_out_i = a_in * b;
assign y_out = y_in + y_out_i[WIDTH+FRAC_BIT-1:FRAC_BIT];
endmodule
```
**How do we connect the PE to PS?** We use direct memory access (DMA).
**What is DMA?**
> Direct memory access (DMA) is a feature of computer systems that allows certain hardware subsystems to access main system memory independently of the central processing unit (CPU).
**What protocol is used in the Xilinx AXI DMA?** AXI Stream protocol.
An AXI stream (AXIS) module consists of two ports: master (**M_AXIS**) and slave (**S_AXIS**). Every port has mandatory singals: **tready**, **tdata**, **tvalid**, and **tlast** signal.
- **tready**: used to indicate that the AXIS module is ready to receive data.
- **tdata**: the data signals itself.
- **tvalid**: indication that there is data that needs to be processed in the tdata.
- **tlast**: indication of the last data of the packet.

<!--  -->
The AXI stream modules can be connected in chains. Every block can do a specific process, then send it to another block as shown in this example.

Slave port is used to receive input, and master port is used to send output. So, in our design, the slave port is used to receive the inputs **a_in**, **b**, and **y_in**. The master port is used to send the result **y_out**.
The PE module needs to be wrapped in a top module called `axis_pe.v`. This top module simply does the AXI stream protocol. Because the circuit of PE is only combinational, the AXI stream protocol implementation is simple.
```Verilog=
module axis_pe
(
input wire aclk,
input wire aresetn,
// *** Control ***
input wire en,
// *** AXIS slave port ***
output wire s_axis_tready,
input wire [31:0] s_axis_tdata,
input wire s_axis_tvalid,
input wire s_axis_tlast,
// *** AXIS master port ***
input wire m_axis_tready,
output wire [31:0] m_axis_tdata,
output wire m_axis_tvalid,
output wire m_axis_tlast
);
wire [7:0] y_out;
// AXI-Stream control
assign s_axis_tready = m_axis_tready;
assign m_axis_tdata = en ? {24'h000000, y_out} : 32'd0;
assign m_axis_tvalid = s_axis_tvalid;
assign m_axis_tlast = s_axis_tlast;
// PE
pe #(8, 0) pe_0
(
.a_in(s_axis_tdata[7:0]),
.y_in(s_axis_tdata[23:16]),
.b(s_axis_tdata[15:8]),
.a_out(),
.y_out(y_out)
);
endmodule
```
This is the timing of our AXIS PE design.

Because the PE is a combinational circuit, the output is available at the same clock as the input. In a more complex design, this does not always happen. Internal memory and state machines are often required.
To connect an AXIS module to PS, we can use a Xilinx IP named AXI DMA. The AXI DMA translates the memory-mapped data (from DDR memory) to stream data and vice versa.

On the PS CPU, the Jupyter Notebook software is running, so we can access the PE module from it: https://github.com/yohanes-erwin/pemrograman_zynq/blob/main/pynq_part_2/part_2.ipynb
**How AXI DMA works?**
- The Python code **send an instruction** to the AXI DMA to move **a certain amount of data** from **a specified address**.
- The data for DMA transfer needs to be contiguous. So, we need to allocate physical addresses.
- The AXI DMA will do the jobs to move that data.

## 2. Custom IP Project
### 2.1. Create Hardware Design
Follow these steps to do the custom IP project:
- Create a new Vivado project for your board.
- Add **ZYNQ7 PS** IP, and then click **Run Connection Automation**.

- From the **Add IP** menu, add an **AXI Direct Memory Access (AXI DMA)** IP.

- After the AXI DMA is added, next you can **double-click on the AXI DMA IP** to configure it.

- Configure the AXI DMA as shown in this window, then click **OK**.

- Back to block design and click **Run Connection Automation**. Check the **S_AXI_LITE** port of the AXI DMA and then click **OK**.

- The following figure shows the block design after the AXI DMA and ZYNQ7 PS are connected.

- Next, we need to connect the DMA to the DDR memory. **Double-click the ZYNQ7** IP.
- On the **Page Navigator**, go to **PS-PL Configuration**. Enable the **S AXI HP0 interface** and the **S AXI HP2 interface**. Then, click **OK**.

- Back to block design and click **Run Connection Automation**.
- Check the **S_AXI_HP0** port of the AXI DMA then in the **Options** set the **Master** to **/axi_dma_0/M_AXIS_MM2S**.

- Check the **S_AXI_HP2** port of the AXI DMA then in the **Options** set the **Master** to **/axi_dma_0/M_AXIS_S2MM**. Then, click **OK**.

- After Run Connection Automation, the block design looks like the following figure. The **M_AXI_MM2S** and **M_AXI_S2MM** are connected to ZYNQ7 PS.

- On the left side (the **Flow Navigator**), select the **Add Sources** menu. Then, select **Add or create design sources**.

- Create a new file named `axis_pe.v`.

- Create a new file again named `pe.v`.

- Back to the block design, **right-click on the block design**, and select the **Add Module** menu.

- Add the `axis_pe` module to the block design.

- Connect the AXIS PE module to the AXI DMA. Connect the **s_axis**, **m_axis**, **aclk**, and **aresetn**.

- From the **Add IP**, add a **Constant** IP.

- Connect the **Constant** IP to the **en** port of the **AXIS PE**.

- In the **Sources** section, **right-click on the design_1** design block, then select the **Create HDL Wrapper** menu.
- After that, **right-click on the design_1_wrapper**, then select the **Set as Top** menu.

- On the left side (the **Flow Navigator**), select the **Generate Block Design** menu. In the **Synthesis Options** section, select the **Global** option.
- Run the **synthesis**, **implementation**, and **generate bitstream**.
- Export the `.tcl`, `.bit`, and `.hwh` file to the FPGA board.
### 2.2. Create Software Design
At this point, the required files to program the FPGA are already on the board. The next step is to create Jupyter Notebook files.
- Open a web browser and open **Jupyter Notebook** on the board. Create a new file from menu **New, Python 3 (pykernel)**.
- Write the following code to test the design: https://github.com/yohanes-erwin/pemrograman_zynq/blob/main/pynq_part_2/part_2.ipynb
- In this program, we initialize the DMA, then we allocate a physical memory for input and output. The PE computation is done by calling `dma_send.transfer()` and `dma_recv.transfer()`.