# DLIC 2020 HW ## ZYNQ System TBD ## ZYNQ MMIO tutorial (aka HW3) ### Overview - ![](https://i.imgur.com/tZSXCGq.png) ### What is MMIO ? - https://en.wikipedia.org/wiki/Memory-mapped_I/O - MMIO is a method of performing input/output (I/O) between the central processing unit (CPU) and peripheral devices in a computer. - https://pynq.readthedocs.io/en/latest/pynq_libraries/mmio.html - The **MMIO class** allows a Python object to access addresses in the system memory mapped. In particular, registers and address space of peripherals in the PL (FPGA) can be accessed. - MMIO provides a simple but powerful way to access and control peripherals. For simple peripherals with a **small number of memory accesses**, or where performance is not critical, MMIO is usually sufficient for most developers. If performance is critical, or large amounts of data need to be transferred between PS and PL, using the Zynq HP interfaces with DMA IP and the PYNQ DMA class may be more appropriate. ### Steps #### Create our Block Design in Xilinx Vivado as in this [video](https://www.xilinx.com/video/hardware/designing-with-vivado-ip-integrator.html) - Our block design contains the following components - ![](https://i.imgur.com/6ydU6La.png) - ZYNQ7 Processing System (Master): ARM CPU - AXI Interconnect - AXI GPIO (Slave): Convert the GPIO signal to AXI bus signal - ZYNQ7 PS block and AXI GPIO block communicates through the AXI4 protocol - Detail of AXI4 protocol can be found here - Remember to change the AXI GPIO block property - ![](https://i.imgur.com/FZUpAkV.png) - Connect the GPIO pin to onbonard LED pin (x4) and GPIO2 pin to button pins (x4) - Remember to run **Connection Automation** and **Block Automation** #### Create HDL wrapper - Save the project - Click on the **Source** tab, and right click on our **block design(design_1)** choose **Create HDL Wrapper** - Choose **Let Vivado manage wrapper and auto-update** #### Synthesis - Place the **pynq-z2_v1.0.xdc** file in the **Constraints** folder - Click on **Generate Bitstream** and the design will be synthesized automatically - **File->Close Implementation Design** - **File->Export->Export Bitstream File** save the file as **hw3_1.bit** - **File->Export->Export Block Design** save the file as **hw3_1.tcl** - Upload two files (hw3_1.bit and hw3_1.tcl) to jupyter notebook #### MMIO info - AXI GPIO is an Xilinx IP which allows PS (master) device to manipulates external IOs (slave) by simply writing to memory-mapped address - AXI GPIO registers are mapped to certain memory address. Writing to those address will automatically change the value of the registers - AXI GPIO registers - ![](https://i.imgur.com/LJGijTz.png) - AXI GPIO Data Register - ![](https://i.imgur.com/WLCnvYl.png) - AXI GPIO Tri-state register - ![](https://i.imgur.com/TjVvUNW.png) - MMIO address is the address where PS should write to. In our case, we use **0x4120_0000** - ![](https://i.imgur.com/3t5mNGN.png) #### PYNQ code ```python= from pynq import Overlay, MMIO # Program the FPGA from ARM processor axi_gpio_design = Overlay("./hw3_1.bit") # gpio0_address is the physical address mapped to AXI GPIO gpio0_address = axi_gpio_design.ip_dict['axi_gpio_0']['phys_addr'] # Compute memory-mapped address for GPIO and GPIO1 led_addr = gpio0_address + 0 btn_addr = gpio0_address + 8 # MMIO(IP_BASE_ADDRESS, ADDRESS_RANGE) leds = MMIO(led_addr, 8) buttons = MMIO(btn_addr, 8) # Read MMIO memory address print(f"buttons: {buttons.read()}") # Write 0x00 to (base_addr + 0x4) GPIO_TRI register to configure GPIO for output leds.write(0x4, 0x0) # Write 0xF to (base_addr + 0x0) GPIO_DATA register to output to GPIO leds.write(0x0, 0xF) #4 LEDs light up ``` ### Reference - https://pynq.readthedocs.io/en/v2.1/pynq_libraries/mmio.html ## ZYNQ Custom IP tutorial (aka hw3-2) ### Package a custom IP - Create an empty xilinx project - Add **adder_2b.v** file to design source - ```c=1 module adder_2b(clk, rst, in_4b, sw0, out_4b); input clk,rst; input [3:0]in_4b; input sw0; output reg [3:0]out_4b; always@(*)begin if(sw0) out_4b = {2’b0, in_4b[3:2]} + {2’b0, in_4b[1:0]}; else out_4b = {2’b0, in_4b[3:2]} - {2’b0, in_4b[1:0]}; end endmodule ``` - Save project, run **synthesis** - **Tools->Create and Package New IP** - **Package your current project** - Choose **Review and Package** tab, click on **Package IP** to create a new package ### Steps - Right click choose **IP Settings** - **IP->Repository** choose the path where you save your packages IP #### Create Block Design - ![](https://i.imgur.com/bSuCfb0.png) - ![](https://i.imgur.com/Ee0rzC9.png) #### Configure AXI GPIO - GPIO is connected to the output of our custom IP - GPIO2 is connected to the onboard LED - ![](https://i.imgur.com/FCgXJz4.png) - ![](https://i.imgur.com/gRB08Z5.png) #### Package custom IP - Remember to run **Connection Automation** and **Block Automation** - Click on the **Source** tab, and right click on our **block design(design_1)** choose **Create HDL Wrapper** - Generate bitstream file and upload to the board #### PYNQ code ```python= from pynq import Overlay, MMIO import time # Program the FPGA from ARM processor axi_gpio_design = Overlay("./hw3_2.bit") # gp0_address is the physical base address mapped to AXI GPIO gp0_address = axi_gpio_design.ip_dict['axi_gpio_0']['phys_addr'] # Compute memory-mapped address for GPIO and GPIO1 gpio_addr = gp0_address gpio1_addr = gp0_address + 8 # MMIO(IP_BASE_ADDRESS, ADDRESS_RANGE) gpio = MMIO(gpio_addr, 8) gpio1 = MMIO(gpio1_addr, 8) # Read the result from adder adder_res_prev = gpio.read() while True: time.sleep(0.1) # Read from GPIO_DATA register adder_res = gpio.read() # Write 0x0 to the GPIO_TRI register to enable output gpio1.write(0x4, 0x0) # Write the result from adder IP to the GPIO_DATA register gpio1.write(0x0, adder_res) # Refer to AXI GPIO v2.0 LogiCORE IP Product Guide (PG144) # (https://www.xilinx.com/support/documentation/ip_documentation/axi_gpio/v2_0/pg144-axi-gpio.pdf) # for details if adder_res != 0 and adder_res != adder_res_prev: print("Adder IP output ", adder_res) adder_res_prev = adder_res ``` ## ZYNQ CDMA tutorial (aka hw4) ### Overview - ![](https://i.imgur.com/IYLX9r3.png) ### CDMA The Xilinx **AXI Central Direct Memory Access (CDMA)** core is a soft Xilinx Intellectual Property (IP). The AXI CDMA provides **high-bandwidth** Direct Memory Access (DMA) between a memory-mapped source address and a memory-mapped destination address using the AXI4 protocol. An optional **Scatter Gather (SG)** feature can be used to offload control and sequencing tasks from the system CPU. - CDMA registers mapping to memory address - ![](https://i.imgur.com/kD48PFM.png) - ![](https://i.imgur.com/onDHKBM.png) - ![](https://i.imgur.com/TL3t780.png) - ![](https://i.imgur.com/Klb8ija.png) - These are some imporant CDMA registers - CDMACR - ![](https://i.imgur.com/ZZ0Dage.png) - ![](https://i.imgur.com/uaf0HGw.png) - ![](https://i.imgur.com/DLeINE0.png) - ![](https://i.imgur.com/uj0s9dx.png) - SA - ![](https://i.imgur.com/3zZc41T.png) - DA - ![](https://i.imgur.com/O50dKh3.png) - BTT - ![](https://i.imgur.com/MhSkY4q.png) ### Block memory The Xilinx **Block Memory Generator (BMG)** core is an advanced memory constructor that generates area and performance-optimized memories using **embedded block RAM resources** in Xilinx FPGAs. The BMG core supports both **Native and AXI4 interfaces.** The AXI4 interface configuration of the BMG core is derived from the Native interface BMG configuration and adds an industry-standard bus protocol interface to the core. Two AXI4 interface styles are available: **AXI4 and AXI4-Lite**. - ![](https://i.imgur.com/hsTXmcp.png) - ![](https://i.imgur.com/ouECXU8.png) ### Steps #### Create custom IP (16 bit multiply) - This IP read data from BRAM at 0x00 and 0x04 then multiply the result and store to 0x0C. Read 0x0C from BRAM gives you the multiplication result ```c=1 module mul16(rst, clk, R_req, addr, R_data, W_req, W_data); input rst; input clk; output R_req; output [31:0] addr; input [31:0] R_data; output [3:0] W_req; output [31:0] W_data; wire w_r; reg [1:0] C_state; wire [1:0] N_state; reg [31:0] indata [1:0]; assign W_data = indata[0][15:0] * indata[1][15:0]; assign N_state = C_state + 1; assign w_r = C_state[0] & C_state[1]; assign R_req = 1; assign W_req = {w_r, w_r, w_r, w_r}; assign addr = {28'b0,C_state,2'b0}; always@(posedge clk or negedge rst)begin if(!rst)begin C_state <= 0; indata[0] <= 0; indata[1] <= 0; end else begin C_state <= N_state; indata[N_state[0]]<=R_data; end end endmodule ``` #### Create block diagram - ![](https://i.imgur.com/RQTcVK3.png) - ![](https://i.imgur.com/QxuMZql.png) - Add slave port for ZYNQ - ![](https://i.imgur.com/9UbV0po.png) #### Change the memory mapped address (the default address will write to system protected memory) - ![](https://i.imgur.com/MRh9906.png) - Configure Block memory to **True Dual Port RAM** since both **AXI BRAM controller** and **mul16 IP** are writing to BRAM - ![](https://i.imgur.com/lWqmDeO.png) - Create bitstream, upload to jupyter notebook #### PYNQ code ```python= from pynq import Overlay, MMIO # Program the FPGA from ARM processor axi_cdma_design = Overlay("./hw4.bit") # cdma_address is the physical base address mapped to AXI CDMA # By writing to these address, we can control the behavior of CDMA cdma_address = axi_cdma_design.ip_dict['axi_cdma_0']['phys_addr'] # mem_addr is the main memory address controlled by ZYNQ mem_addr = 0x30000000 # bram_addr is the mmap Block RAM address bram_addr = 0xc0000000 # MMIO(IP_BASE_ADDRESS, ADDRESS_RANGE) # in_addr is the input address space MMIO object. # We allocate 16 bytes of memory because our custom IP will take two input value # in 0x0 and 0x04 respectively. Then, store the result back to 0xC. in_addr = MMIO(mem_addr, 16) # cdma is the CDMA address space MMIO object. # We allocate 44 bytes because cdma will eventually write 4 bytes to # (base_addr + 0x28) cdma = MMIO(cdma_address, 44) # Place the two numbers (20, 60) into the system memory a=20 b=60 # Write 0x14=20 to (in_addr + 0) in_addr.write(0x0, a) # Write 0x3c=60 to (in_addr + 4) in_addr.write(0x4, b) # CDMA move data from system memory to BRAM # Write 0x04 to CDMACR (offset: 0x00) to soft reset CDMA cdma.write(0x00, 0x04) # Write 0x30000000 to SA (offset: 0x18) to set source address for CDMA cdma.write(0x18, mem_addr) # Write 0xc0000000 to DA (offset: 0x20) to set destination address for CDMA cdma.write(0x20, bram_addr) # Write 0x08 to BTT (offset: 0x28) to set the number of bytes to transfer cdma.write(0x28, 0x08) # CDMA move data from BRAM to system memory cdma.write(0x00, 0x04) cdma.write(0x18, bram_addr) cdma.write(0x20, mem_addr) # Write 0x10 (16 bytes) to BTT (offset: 0x28) to set the number of bytes to transfer cdma.write(0x28, 0x10) print(f"ans of {a}*{b} = {in_addr.read(0xC)}") #1200 ``` ## ZYNQ conv tutorial (aka hw5) ### Brief architecture - The CPU system connects with BRAM and CDMA through AXI bus. Our custom convolution IP is connected directly to BRAM. ![](https://i.imgur.com/lrIxiDi.png) ### Vivado architecture - CPU system, CDMA, BRAM controller, CDMA controller are all connected through AXI interconnect - BRAM itself is connected directly to BRAM controller and convolutional unit. Notice that there should be 2 BRAMs (namely BRAM0, and BRAM1), each one needs a BRAM controller. ![](https://i.imgur.com/WOvmfKu.png) ### Actual block design in Vivado - In this system, there are 2 AXI GPIO systems, 2 BRAM controllers, 2 BRAMs, 1 convolutional unit, 1 ZYNQ processing engine, 1 CDMA unit, and 1 processor system reset. - 2 AXI GPIO is responsible for setting the logic level of "start" pin, and reading the logic level of "finish" pin. - 2 BRAM controllers connect BRAM and AXI interconnect. - 1 convolutional unit (**conv_v1_0**) connects directly with 2 BRAMs. Right click: 'view image in new tab' for larger image ![](https://i.imgur.com/bT0MUmB.png) ### Address mapping ![](https://i.imgur.com/K10Gupz.png) ### Python program ```python= from pynq import Overlay, MMIO import time # Program the FPGA from ARM processor conv_design = Overlay("./hw5.bit") # cdma_addr is the physical base address mapped to AXI CDMA # By writing to these address, we can control the behavior of CDMA cdma_addr = conv_design.ip_dict['axi_cdma_0']['phys_addr'] # gpio0_address is the physical address mapped to AXI GPIO gpio0_address = conv_design.ip_dict['axi_gpio_0']['phys_addr'] # gpio1_address is the physical address mapped to AXI GPIO gpio1_address = conv_design.ip_dict['axi_gpio_1']['phys_addr'] # Compute memory-mapped address for GPIO start_conv_addr = gpio0_address + 0 # MMIO address # mem_addr is the main memory address controlled by ZYNQ mem_addr = 0x30000000 # bram0_addr is the mmap Block RAM address bram0_addr = 0xC0000000 # bram1_addr is the mmap Block RAM address bram1_addr = 0xC2000000 # MMIO(IP_BASE_ADDRESS, ADDRESS_RANGE) # in_mem is the input address space MMIO object. # We allocate 3176(=794*4) bytes of memory for input in_bytes = 3176 in_mem = MMIO(mem_addr, in_bytes) # cdma is the CDMA address space MMIO object. # We allocate 2704(=26*26*4) bytes of memory for output out_bytes = 2704 cdma = MMIO(cdma_addr, out_bytes) # gpio0 is the GPIO address space MMIO object. # We allocate 8 bytes of memory start = MMIO(start_conv_addr, 8) # Read input from input.hex input_file = open('input.hex', 'r') in_lines = input_file.readlines() raddr = 0 for line in in_lines: hex_string = line[0:8] to_int = int(hex_string, 16) # Write to memory in_mem.write(raddr, to_int) # print(f"Writing {hex_string} to addr {raddr}") raddr += 4 # CDMA move data from system memory to BRAM # Write 0x04 to CDMACR (offset: 0x00) to soft reset CDMA cdma.write(0x00, 0x04) # Write 0x30000000 to SA (offset: 0x18) to set source address for CDMA cdma.write(0x18, mem_addr) # Write 0xC0000000 to DA (offset: 0x20) to set destination address for CDMA cdma.write(0x20, bram0_addr) # Write 0x08 to BTT (offset: 0x28) to set the number of bytes to transfer cdma.write(0x28, in_bytes) # GPIO starts the conv custom IP # Write 0x00 to (base_addr + 0x4) GPIO_TRI register to configure GPIO for output start.write(0x4, 0x0) # Write 0xF to (base_addr + 0x0) GPIO_DATA register to output to GPIO start.write(0x0, 0xF) # GPIO output 1 time.sleep(0.5) # This is unnecessary, just for visual demonstration start.write(0x0, 0x0) # GPIO output 0 # CDMA move data from BRAM1 to system memory cdma.write(0x00, 0x04) cdma.write(0x18, bram1_addr) cdma.write(0x20, mem_addr) # Write out_bytes to BTT (offset: 0x28) to set the number of bytes to transfer cdma.write(0x28, out_bytes) # Read output from output.hex golden_file = open('golden.hex', 'r') g_lines = golden_file.readlines() err = 0 addr = 0 for line in g_lines: expect = hex(int(line[0:8], 16)) # Read from to memory output = hex(in_mem.read(addr)) if(output != expect): print(f"[Error] Output is {output} expected {expect} at addr {addr}") err += 1 addr += 4 if(err == 0): print("Congradulation ! ALL pass") else: print("Error ! data mismatch") ``` ## CNN system (hw7) - Actual block design in Vivado - ![](https://i.imgur.com/ZEPN3EY.png) ### Address mapping - ![](https://i.imgur.com/OthAOQs.png) ### Python program ```python= from pynq import Overlay # Program the FPGA from ARM processor conv_design = Overlay("./hw7.bit") ```