# PYNQ Tutorial 3: Block Memory and Custom IP ## Objective After you complete this tutorial, you should be able to: - Understand how Xilinx Block Memory (BRAM) works. - Create a custom RTL design that accesses BRAM. ## Source Code This repository contains all of the code required in order to follow this tutorial: https://github.com/yohanes-erwin/pemrograman_zynq/tree/main/pynq_part_3 ## 1. Introduction **Xilinx Memory** Xilinx memories can be classified into two types: - **FIFO**: AXI-Stream FIFO, generic FIFO - **Addresable memory**: Block Memory Generator ![Screenshot 2024-11-12 105607](https://hackmd.io/_uploads/H1gpG8lMJg.png) > Memory can be instantiated in block design or using Verilog code. Block memory generator is a dedicated memory block on the FPGA. This means that BRAM does not use flip-flop or LUT resources. This core has two fully independent ports that access a shared memory space. Both A and B ports have a write and a read interface. Block memory has a limited size. Block memory can be added to the design using block design (GUI) or with Verilog/VHDL (Xilinx Parameterized Macros, XPM) code. Block memory has two operating modes. - **BRAM controller**: address is incremented every 4. Used together with AXI BRAM controller IP. Usually used for interfacing with PS. **Slow for large data transfers compared to AXI DMA.** - **Stand Alone**: address is incremented every 1. No AXI BRAM controller IP is usually used. This memory is usually used for internal design (not directly connected to PS). ![](https://weenslab.gitbook.io/~gitbook/image?url=https%3A%2F%2F4146991827-files.gitbook.io%2F%7E%2Ffiles%2Fv0%2Fb%2Fgitbook-x-prod.appspot.com%2Fo%2Fspaces%252FIsb2SAYKLkGlVOGOY0EE%252Fuploads%252FfUiQiWtTiWdOgF1XJz9Q%252Fbram_mode.png%3Falt%3Dmedia%26token%3D047e264e-69b7-4f32-ac6b-73b9a0a8d9ab&width=768&dpr=1&quality=100&sign=9c1b7dab&sv=1) The Block Memory Generator core uses embedded block RAM to generate five types of memories. - Single-port RAM - Simple Dual-port RAM - True Dual-port RAM - Single-port ROM - Dual-port ROM ![](https://weenslab.gitbook.io/~gitbook/image?url=https%3A%2F%2F4146991827-files.gitbook.io%2F%7E%2Ffiles%2Fv0%2Fb%2Fgitbook-x-prod.appspot.com%2Fo%2Fspaces%252FIsb2SAYKLkGlVOGOY0EE%252Fuploads%252FAx6xb2hdMEwLKQ8NQnC1%252Fbram_type.png%3Falt%3Dmedia%26token%3Daed9984d-3d6c-4e7a-97bf-4c4c93ae1462&width=768&dpr=1&quality=100&sign=7bdc2394&sv=1) The following figure shows the **BRAM write timing diagram for the BRAM controller mode**. Every address is incremented every 4 because the address is 32-bit. Every piece of data is byte-addressable, as indicated by the `we` signal. ![](https://weenslab.gitbook.io/~gitbook/image?url=https%3A%2F%2F4146991827-files.gitbook.io%2F%7E%2Ffiles%2Fv0%2Fb%2Fgitbook-x-prod.appspot.com%2Fo%2Fspaces%252FIsb2SAYKLkGlVOGOY0EE%252Fuploads%252FLWSgWs8mH0pBkPGx55s1%252Fbram_timing_write.png%3Falt%3Dmedia%26token%3De3dc2ad2-292f-4d3a-8d51-010eb9cd4083&width=768&dpr=1&quality=100&sign=56a20fff&sv=1) The following figure shows the BRAM read timing diagram for the BRAM controller mode. The address for output latency is one clock cycle. ![](https://weenslab.gitbook.io/~gitbook/image?url=https%3A%2F%2F4146991827-files.gitbook.io%2F%7E%2Ffiles%2Fv0%2Fb%2Fgitbook-x-prod.appspot.com%2Fo%2Fspaces%252FIsb2SAYKLkGlVOGOY0EE%252Fuploads%252FRaQ3fX2gR1y04i8vbsFo%252Fbram_timing_read.png%3Falt%3Dmedia%26token%3Dd1746e9d-9959-420d-a332-f5ff264516d9&width=768&dpr=1&quality=100&sign=7042292c&sv=1) The reset type for BRAM is active-high. BRAM size is limited, 2.1 Mb on Zybo and 4.9 Mb on PYNQ Z1. **System Design** In this tutorial, we are going to create a system that consists of Zynq PS and PL design. The PL design is a simple processing element (PE) module that does multiply and add. The design is similar to the previous tutorial, but instead of AXI DMA, we use BRAM controllers and block memory. ![](https://weenslab.gitbook.io/~gitbook/image?url=https%3A%2F%2F4146991827-files.gitbook.io%2F%7E%2Ffiles%2Fv0%2Fb%2Fgitbook-x-prod.appspot.com%2Fo%2Fspaces%252FIsb2SAYKLkGlVOGOY0EE%252Fuploads%252FvtR3lQQJG4WmYjLczZKJ%252Fblock_diagram_memory.jpg%3Falt%3Dmedia%26token%3D81bc9b96-9f0f-4dd5-b934-acfc63324832&width=768&dpr=1&quality=100&sign=bdc2e64a&sv=1) On the PS CPU, the Jupyter Notebook software is running, so we can access the memory from it. https://github.com/yohanes-erwin/pemrograman_zynq/blob/main/pynq_part_3/part_3.ipynb This is the block diagram of the PE module. It is a multiply and add operation. ![](https://weenslab.gitbook.io/~gitbook/image?url=https%3A%2F%2F4146991827-files.gitbook.io%2F%7E%2Ffiles%2Fv0%2Fb%2Fgitbook-x-prod.appspot.com%2Fo%2Fspaces%252FIsb2SAYKLkGlVOGOY0EE%252Fuploads%252FLcZgWAp1rk5HxoRQDUWC%252Fpe.jpg%3Falt%3Dmedia%26token%3D22cbdfa0-10e3-4ff9-824e-6da5d151fd8d&width=768&dpr=4&quality=100&sign=6e1df3ab&sv=1 =300x) This is the Verilog implementation of the PE module. ```Verilog= module pe #( parameter WIDTH = 8, parameter FRAC_BIT = 0 ) ( input wire signed [WIDTH-1:0] a_in, input wire signed [WIDTH-1:0] y_in, input wire signed [WIDTH-1:0] b, output wire signed [WIDTH-1:0] a_out, output wire signed [WIDTH-1:0] y_out ); wire signed [WIDTH*2-1:0] y_out_i; assign a_out = a_in; assign y_out_i = a_in * b; assign y_out = y_in + y_out_i[WIDTH+FRAC_BIT-1:FRAC_BIT]; endmodule ``` This is the Verilog implementation of the PE top module. ![](https://weenslab.gitbook.io/~gitbook/image?url=https%3A%2F%2F4146991827-files.gitbook.io%2F%7E%2Ffiles%2Fv0%2Fb%2Fgitbook-x-prod.appspot.com%2Fo%2Fspaces%252FIsb2SAYKLkGlVOGOY0EE%252Fuploads%252FQ86JpucBaoverMDbqrHF%252Fpe_top_module.jpg%3Falt%3Dmedia%26token%3Ddb321a65-e91b-4e89-9098-e0756231916a&width=768&dpr=4&quality=100&sign=97bd250b&sv=1 =650x) **We add a ready signal to make sure the process is finished before the PS (Python) code reads the result.** ![Screenshot 2024-11-12 131622](https://hackmd.io/_uploads/BJ-AGOlGyl.png) A rising edge circuit is used to make a single clock start signal from the level signal. ![](https://weenslab.gitbook.io/~gitbook/image?url=https%3A%2F%2F4146991827-files.gitbook.io%2F%7E%2Ffiles%2Fv0%2Fb%2Fgitbook-x-prod.appspot.com%2Fo%2Fspaces%252FIsb2SAYKLkGlVOGOY0EE%252Fuploads%252F3lAR6w3tQ56tAHfJ6W2V%252Fbram_pe_timing_diagram.png%3Falt%3Dmedia%26token%3Df1e7299b-93b3-4542-a49c-c6db5a1f5ce6&width=768&dpr=4&quality=100&sign=c269a6c0&sv=1) ## 2. Memory Project ### 2.1. Create Hardware Design Follow these steps to do the custom IP project: - Create a new Vivado project for your board. - Add **ZYNQ7 PS** IP, and then click **Run Connection Automation**. - Then add the following IPs: **one AXI GPIO, two AXI BRAM Controller, two Block Memory Generator**. ![](https://weenslab.gitbook.io/~gitbook/image?url=https%3A%2F%2F4146991827-files.gitbook.io%2F%7E%2Ffiles%2Fv0%2Fb%2Fgitbook-x-prod.appspot.com%2Fo%2Fspaces%252FIsb2SAYKLkGlVOGOY0EE%252Fuploads%252FsIYMTN1Y5fK1HZ4gKvvy%252Fvivado_bram_block_design.png%3Falt%3Dmedia%26token%3D64b2c09d-6e81-4e59-a5f9-26775e33513c&width=768&dpr=4&quality=100&sign=f3fd7657&sv=1) - Configure the **AXI GPIO** to have dual channels, one for **1-bit output** and the other for **1-bit input**. ![](https://weenslab.gitbook.io/~gitbook/image?url=https%3A%2F%2F4146991827-files.gitbook.io%2F%7E%2Ffiles%2Fv0%2Fb%2Fgitbook-x-prod.appspot.com%2Fo%2Fspaces%252FIsb2SAYKLkGlVOGOY0EE%252Fuploads%252FnLzhoX2q0fCW9SEouAJF%252Fvivado_bram_gpio_config.png%3Falt%3Dmedia%26token%3D8d3087dc-0166-4b68-9ae2-e833e11446a9&width=768&dpr=4&quality=100&sign=9402792&sv=1) - Configure the **AXI BRAM Controller** to have only **one interface**. ![](https://weenslab.gitbook.io/~gitbook/image?url=https%3A%2F%2F4146991827-files.gitbook.io%2F%7E%2Ffiles%2Fv0%2Fb%2Fgitbook-x-prod.appspot.com%2Fo%2Fspaces%252FIsb2SAYKLkGlVOGOY0EE%252Fuploads%252FKFjTu6DPW6aukx6okLwr%252Fvivado_bram_controller_config.png%3Falt%3Dmedia%26token%3Db851a5ff-663c-4532-9a51-550a0ac8ef89&width=768&dpr=4&quality=100&sign=9000039c&sv=1) - Configure the **Block Memory Generator** to **True Dual Port RAM**. ![](https://weenslab.gitbook.io/~gitbook/image?url=https%3A%2F%2F4146991827-files.gitbook.io%2F%7E%2Ffiles%2Fv0%2Fb%2Fgitbook-x-prod.appspot.com%2Fo%2Fspaces%252FIsb2SAYKLkGlVOGOY0EE%252Fuploads%252FF8ebOC1uhO0fGGnwRvd1%252Fvivado_bram_bram_config.png%3Falt%3Dmedia%26token%3D92b747c7-5ea0-4c2a-80d5-10a0b0c90aad&width=768&dpr=4&quality=100&sign=b3bc71c7&sv=1) - Check the **S_AXI** port of the **GPIO** and **BRAM** controllers to PS. Then, click **OK**. ![](https://weenslab.gitbook.io/~gitbook/image?url=https%3A%2F%2F4146991827-files.gitbook.io%2F%7E%2Ffiles%2Fv0%2Fb%2Fgitbook-x-prod.appspot.com%2Fo%2Fspaces%252FIsb2SAYKLkGlVOGOY0EE%252Fuploads%252FcJMT8LENWO4Lsq0VsDsl%252Fvivado_bram_run_automation.png%3Falt%3Dmedia%26token%3De942b5a3-8aab-40a8-9797-2b7aec2defaa&width=768&dpr=4&quality=100&sign=7c21b530&sv=1) - The following figure shows the connection to the GPIO and BRAM. ![](https://weenslab.gitbook.io/~gitbook/image?url=https%3A%2F%2F4146991827-files.gitbook.io%2F%7E%2Ffiles%2Fv0%2Fb%2Fgitbook-x-prod.appspot.com%2Fo%2Fspaces%252FIsb2SAYKLkGlVOGOY0EE%252Fuploads%252FGkaOIi2Gj9a18yYSb91X%252Fvivado_bram_connection.png%3Falt%3Dmedia%26token%3D88af5e0e-7322-4070-9823-6c6b99eb0d45&width=768&dpr=4&quality=100&sign=b92f8e24&sv=1) - Add the RTL design for the **pe_top.v**, **pe.v**, and **register.v**. - Add the **pe_top** to the **block design**. Then, connect it to **GPIO** and **BRAM** as shown in the following figure. ![](https://weenslab.gitbook.io/~gitbook/image?url=https%3A%2F%2F4146991827-files.gitbook.io%2F%7E%2Ffiles%2Fv0%2Fb%2Fgitbook-x-prod.appspot.com%2Fo%2Fspaces%252FIsb2SAYKLkGlVOGOY0EE%252Fuploads%252FQc9dxe4lRQczoDNn57qU%252Fvivado_bram_pe_top.png%3Falt%3Dmedia%26token%3Daf4f83bc-c360-4496-82a4-9923a750dfed&width=768&dpr=4&quality=100&sign=9b8e2da9&sv=1) - In the **Sources** section, **right-click on the design_1** design block, then select the **Create HDL Wrapper** menu. - After that, **right-click on the design_1_wrapper**, then select the **Set as Top** menu. ![](https://weenslab.gitbook.io/~gitbook/image?url=https%3A%2F%2F4146991827-files.gitbook.io%2F%7E%2Ffiles%2Fv0%2Fb%2Fgitbook-x-prod.appspot.com%2Fo%2Fspaces%252FIsb2SAYKLkGlVOGOY0EE%252Fuploads%252FWDLUo1oCpAlMFvkvwPdB%252Fvivado_bram_set_as_top.png%3Falt%3Dmedia%26token%3Dd4442720-944a-4408-88ff-623d6d1566ed&width=768&dpr=4&quality=100&sign=ff19ebdd&sv=1) - On the left side (the **Flow Navigator**), select the **Generate Block Design** menu. In the **Synthesis Options** section, select the **Global** option. - Run the **synthesis**, **implementation**, and **generate bitstream**. - Export the ``.tcl`, `.bit`, and `.hwh` file to the FPGA board. ### 2.2. Create Software Design At this point, the required files to program the FPGA are already on the board. The next step is to create Jupyter Notebook files. - Open a web browser and open **Jupyter Notebook** on the board. Create a new file from menu **New, Python 3 (pykernel)**. - Write the following code to test the design: https://github.com/yohanes-erwin/pemrograman_zynq/blob/main/pynq_part_3/part_3.ipynb - In this program, we initialize the GPIO and BRAMs, then we write data to BRAM input, start the PE core, and read the result from BRAM output.