# PYNQ Tutorial 3: Block Memory and Custom IP
## Objective
After you complete this tutorial, you should be able to:
- Understand how Xilinx Block Memory (BRAM) works.
- Create a custom RTL design that accesses BRAM.
## Source Code
This repository contains all of the code required in order to follow this tutorial: https://github.com/yohanes-erwin/pemrograman_zynq/tree/main/pynq_part_3
## 1. Introduction
**Xilinx Memory**
Xilinx memories can be classified into two types:
- **FIFO**: AXI-Stream FIFO, generic FIFO
- **Addresable memory**: Block Memory Generator

> Memory can be instantiated in block design or using Verilog code.
Block memory generator is a dedicated memory block on the FPGA. This means that BRAM does not use flip-flop or LUT resources. This core has two fully independent ports that access a shared memory space. Both A and B ports have a write and a read interface.
Block memory has a limited size. Block memory can be added to the design using block design (GUI) or with Verilog/VHDL (Xilinx Parameterized Macros, XPM) code.
Block memory has two operating modes.
- **BRAM controller**: address is incremented every 4. Used together with AXI BRAM controller IP. Usually used for interfacing with PS. **Slow for large data transfers compared to AXI DMA.**
- **Stand Alone**: address is incremented every 1. No AXI BRAM controller IP is usually used. This memory is usually used for internal design (not directly connected to PS).

The Block Memory Generator core uses embedded block RAM to generate five types of memories.
- Single-port RAM
- Simple Dual-port RAM
- True Dual-port RAM
- Single-port ROM
- Dual-port ROM

The following figure shows the **BRAM write timing diagram for the BRAM controller mode**. Every address is incremented every 4 because the address is 32-bit. Every piece of data is byte-addressable, as indicated by the `we` signal.

The following figure shows the BRAM read timing diagram for the BRAM controller mode. The address for output latency is one clock cycle.

The reset type for BRAM is active-high.
BRAM size is limited, 2.1 Mb on Zybo and 4.9 Mb on PYNQ Z1.
**System Design**
In this tutorial, we are going to create a system that consists of Zynq PS and PL design. The PL design is a simple processing element (PE) module that does multiply and add. The design is similar to the previous tutorial, but instead of AXI DMA, we use BRAM controllers and block memory.

On the PS CPU, the Jupyter Notebook software is running, so we can access the memory from it. https://github.com/yohanes-erwin/pemrograman_zynq/blob/main/pynq_part_3/part_3.ipynb
This is the block diagram of the PE module. It is a multiply and add operation.

This is the Verilog implementation of the PE module.
```Verilog=
module pe
#(
parameter WIDTH = 8,
parameter FRAC_BIT = 0
)
(
input wire signed [WIDTH-1:0] a_in,
input wire signed [WIDTH-1:0] y_in,
input wire signed [WIDTH-1:0] b,
output wire signed [WIDTH-1:0] a_out,
output wire signed [WIDTH-1:0] y_out
);
wire signed [WIDTH*2-1:0] y_out_i;
assign a_out = a_in;
assign y_out_i = a_in * b;
assign y_out = y_in + y_out_i[WIDTH+FRAC_BIT-1:FRAC_BIT];
endmodule
```
This is the Verilog implementation of the PE top module.

**We add a ready signal to make sure the process is finished before the PS (Python) code reads the result.**

A rising edge circuit is used to make a single clock start signal from the level signal.

## 2. Memory Project
### 2.1. Create Hardware Design
Follow these steps to do the custom IP project:
- Create a new Vivado project for your board.
- Add **ZYNQ7 PS** IP, and then click **Run Connection Automation**.
- Then add the following IPs: **one AXI GPIO, two AXI BRAM Controller, two Block Memory Generator**.

- Configure the **AXI GPIO** to have dual channels, one for **1-bit output** and the other for **1-bit input**.

- Configure the **AXI BRAM Controller** to have only **one interface**.

- Configure the **Block Memory Generator** to **True Dual Port RAM**.

- Check the **S_AXI** port of the **GPIO** and **BRAM** controllers to PS. Then, click **OK**.

- The following figure shows the connection to the GPIO and BRAM.

- Add the RTL design for the **pe_top.v**, **pe.v**, and **register.v**.
- Add the **pe_top** to the **block design**. Then, connect it to **GPIO** and **BRAM** as shown in the following figure.

- In the **Sources** section, **right-click on the design_1** design block, then select the **Create HDL Wrapper** menu.
- After that, **right-click on the design_1_wrapper**, then select the **Set as Top** menu.

- On the left side (the **Flow Navigator**), select the **Generate Block Design** menu. In the **Synthesis Options** section, select the **Global** option.
- Run the **synthesis**, **implementation**, and **generate bitstream**.
- Export the ``.tcl`, `.bit`, and `.hwh` file to the FPGA board.
### 2.2. Create Software Design
At this point, the required files to program the FPGA are already on the board. The next step is to create Jupyter Notebook files.
- Open a web browser and open **Jupyter Notebook** on the board. Create a new file from menu **New, Python 3 (pykernel)**.
- Write the following code to test the design: https://github.com/yohanes-erwin/pemrograman_zynq/blob/main/pynq_part_3/part_3.ipynb
- In this program, we initialize the GPIO and BRAMs, then we write data to BRAM input, start the PE core, and read the result from BRAM output.