# Extend [PicoRV32](https://github.com/YosysHQ/picorv32) > 黃若綾, 蔡雅彤, 林靖婷 [GitHub](https://github.com/tyt017/picorv32/tree/main) ## Objective 1. Explore the internal implementation of PicoRV32 2. Implement and verify bitmanip extensions Zba, Zbb, Zbc, and Zbs 3. Implement static/dynamic branch prediction ## Overview of PicoRV32 - Small (750-2000 LUTs in 7-Series Xilinx Architecture) - High $f_{max}$ (250-450 MHz on 7-Series Xilinx FPGAs) - Selectable native memory interface or AXI4-Lite master - Optional IRQ support (using a simple custom ISA) - Optional Co-Processor Interface This CPU is designed as an auxiliary processor in FPGA designs and ASICs. It supports a wide range of configurations for flexibility in performance, size, and feature set. Example configurations include: - **RV32E Core:** Disable registers x16..x31 and optional instructions for a smaller footprint. - **Single-Port Register File:** Reduces size but sacrifices performance. PicoRV32 comes in three core variations: - `picorv32`: Simple native memory interface. - `picorv32_axi`: AXI4-Lite Master interface for compatibility with AXI-based systems. - `picorv32_wb`: Wishbone Master interface. Additional modules include an AXI4 adapter and PCPI cores for implementing custom instructions. ### Files in this Repository #### Core Modules | Module | Description | |-----------------------|-------------------------------------------------------| | `picorv32` | The PicoRV32 CPU | | `picorv32_axi` | CPU with AXI4-Lite interface | | `picorv32_axi_adapter`| Adapter from PicoRV32 Memory Interface to AXI4-Lite | | `picorv32_wb` | CPU with Wishbone Master interface | | `picorv32_pcpi_mul` | PCPI core implementing MUL[H[SU|U]] instructions | | `picorv32_pcpi_fast_mul` | Single-cycle multiplier version of `pcpi_mul` | | `picorv32_pcpi_div` | PCPI core implementing DIV[U]/REM[U] instructions | #### Other Files - **Makefile and Testbenches**: Test environments with multiple configurations. - **`firmware/`**: Simple test firmware for IRQ handling and PCPI cores. - **`tests/`**: Instruction-level tests from riscv-tests. - **`dhrystone/`**: Dhrystone benchmark. - **`picosoc/`**: Example SoC using PicoRV32. - **`scripts/`**: Synthesis and hardware configuration scripts. --- ## [Install and Build the Environment of PicoRV32](https://github.com/YosysHQ/picorv32?tab=readme-ov-file#building-a-pure-rv32i-toolchain) ### Clone and build PicoRV32 ``` bash git clone https://github.com/YosysHQ/picorv32.git cd ~/picorv32 make download-tools make -j8 build-tools ``` ### Install the required packages ``` bash sudo apt-get install autoconf automake autotools-dev curl python3 libmpc-dev libmpfr-dev libgmp-dev gawk build-essential bison flex texinfo gperf libtool patchutils bc zlib1g-dev libexpat-dev ninja-build ``` ### Build the complete RISC-V GNU toolchain The RISC-V GNU toolchain and libraries will be install in `/opt/riscv32i`: ```bash sudo mkdir /opt/riscv32i sudo chown $USER /opt/riscv32i git clone https://github.com/riscv/riscv-gnu-toolchain riscv-gnu-toolchain-rv32i cd riscv-gnu-toolchain-rv32i git submodule update --init --recursive mkdir build; cd build ../configure --with-arch=rv32i --prefix=/opt/riscv32i make -j8 ``` ### Check Run `make test_vcd` in the picorv32 folder, and the result: ```bash iverilog -o testbench.vvp -DCOMPRESSED_ISA testbench.v picorv32.v chmod -x testbench.vvp /opt/riscv32i/bin/riscv32-unknown-elf-gcc -c -mabi=ilp32 -march=rv32imc -o firmware/start.o firmware/start.S /opt/riscv32i/bin/riscv32-unknown-elf-gcc -c -mabi=ilp32 -march=rv32ic -Os --std=c99 -Werror -Wall -Wextra -Wshadow -Wundef -Wpointer-arith -Wcast-qual -Wcast-align -Wwrite-strings -Wredundant-decls -Wstrict-prototypes -Wmissing-prototypes -pedantic -ffreestanding -nostdlib -o firmware/irq.o firmware/irq.c /opt/riscv32i/bin/riscv32-unknown-elf-gcc -c -mabi=ilp32 -march=rv32ic -Os --std=c99 -Werror -Wall -Wextra -Wshadow -Wundef -Wpointer-arith -Wcast-qual -Wcast-align -Wwrite-strings -Wredundant-decls -Wstrict-prototypes -Wmissing-prototypes -pedantic -ffreestanding -nostdlib -o firmware/print.o firmware/print.c /opt/riscv32i/bin/riscv32-unknown-elf-gcc -c -mabi=ilp32 -march=rv32ic -Os --std=c99 -Werror -Wall -Wextra -Wshadow -Wundef -Wpointer-arith -Wcast-qual -Wcast-align -Wwrite-strings -Wredundant-decls -Wstrict-prototypes -Wmissing-prototypes -pedantic -ffreestanding -nostdlib -o firmware/hello.o firmware/hello.c /opt/riscv32i/bin/riscv32-unknown-elf-gcc -c -mabi=ilp32 -march=rv32ic -Os --std=c99 -Werror -Wall -Wextra -Wshadow -Wundef -Wpointer-arith -Wcast-qual -Wcast-align -Wwrite-strings -Wredundant-decls -Wstrict-prototypes -Wmissing-prototypes -pedantic -ffreestanding -nostdlib -o firmware/sieve.o firmware/sieve.c /opt/riscv32i/bin/riscv32-unknown-elf-gcc -c -mabi=ilp32 -march=rv32ic -Os --std=c99 -Werror -Wall -Wextra -Wshadow -Wundef -Wpointer-arith -Wcast-qual -Wcast-align -Wwrite-strings -Wredundant-decls -Wstrict-prototypes -Wmissing-prototypes -pedantic -ffreestanding -nostdlib -o firmware/multest.o firmware/multest.c /opt/riscv32i/bin/riscv32-unknown-elf-gcc -c -mabi=ilp32 -march=rv32ic -Os --std=c99 -Werror -Wall -Wextra -Wshadow -Wundef -Wpointer-arith -Wcast-qual -Wcast-align -Wwrite-strings -Wredundant-decls -Wstrict-prototypes -Wmissing-prototypes -pedantic -ffreestanding -nostdlib -o firmware/stats.o firmware/stats.c /opt/riscv32i/bin/riscv32-unknown-elf-gcc -c -mabi=ilp32 -march=rv32im -o tests/addi.o -DTEST_FUNC_NAME=addi \ -DTEST_FUNC_TXT='"addi"' -DTEST_FUNC_RET=addi_ret tests/addi.S /opt/riscv32i/bin/riscv32-unknown-elf-gcc -c -mabi=ilp32 -march=rv32im -o tests/add.o -DTEST_FUNC_NAME=add \ -DTEST_FUNC_TXT='"add"' -DTEST_FUNC_RET=add_ret tests/add.S /opt/riscv32i/bin/riscv32-unknown-elf-gcc -c -mabi=ilp32 -march=rv32im -o tests/andi.o -DTEST_FUNC_NAME=andi \ -DTEST_FUNC_TXT='"andi"' -DTEST_FUNC_RET=andi_ret tests/andi.S ... -DTEST_FUNC_TXT='"xor"' -DTEST_FUNC_RET=xor_ret tests/xor.S /opt/riscv32i/bin/riscv32-unknown-elf-gcc -Os -mabi=ilp32 -march=rv32imc -ffreestanding -nostdlib -o firmware/firmware.elf \ -Wl,--build-id=none,-Bstatic,-T,firmware/sections.lds,-Map,firmware/firmware.map,--strip-debug \ firmware/start.o firmware/irq.o firmware/print.o firmware/hello.o firmware/sieve.o firmware/multest.o firmware/stats.o tests/addi.o tests/add.o tests/andi.o tests/and.o tests/auipc.o tests/beq.o tests/bge.o tests/bgeu.o tests/blt.o tests/bltu.o tests/bne.o tests/div.o tests/divu.o tests/jalr.o tests/jal.o tests/j.o tests/lb.o tests/lbu.o tests/lh.o tests/lhu.o tests/lui.o tests/lw.o tests/mulh.o tests/mulhsu.o tests/mulhu.o tests/mul.o tests/ori.o tests/or.o tests/rem.o tests/remu.o tests/sb.o tests/sh.o tests/simple.o tests/slli.o tests/sll.o tests/slti.o tests/slt.o tests/srai.o tests/sra.o tests/srli.o tests/srl.o tests/sub.o tests/sw.o tests/xori.o tests/xor.o -lgcc /opt/riscv32i/lib/gcc/riscv32-unknown-elf/14.2.0/../../../../riscv32-unknown-elf/bin/ld: warning: firmware/firmware.elf has a LOAD segment with RWX permissions chmod -x firmware/firmware.elf /opt/riscv32i/bin/riscv32-unknown-elf-objcopy -O binary firmware/firmware.elf firmware/firmware.bin chmod -x firmware/firmware.bin python3 firmware/makehex.py firmware/firmware.bin 32768 > firmware/firmware.hex vvp -N testbench.vvp +vcd +trace +noerror VCD info: dumpfile testbench.vcd opened for output. hello world lui..OK auipc..OK j..OK jal..OK jalr..OK beq..OK bne..OK blt..OK bge..OK bltu..OK bgeu..OK lb..OK lh..OK lw..OK lbu..OK lhu..OK sb..OK sh..OK sw..OK addi..OK slti..OK xori..OK ori..OK andi..OK slli..OK srli..OK srai..OK add..OK sub..OK sll..OK slt..OK xor..OK srl..OK sra..OK or..OK and..OK mulh..OK mulhsu..OK mulhu..OK mul..OK div..OK divu..OK rem..OK remu..OK simple..OK 1st prime is 2. 2nd prime is 3. ... 30th prime is 113. 31st prime is 127. checksum: 1772A48F OK input [FFFFFFFF] 80000000 [FFFFFFFF] FFFFFFFF hard mul 80000000 00000000 80000000 7FFFFFFF soft mul 80000000 00000000 80000000 7FFFFFFF OK hard div 80000000 00000000 00000000 80000000 soft div 80000000 00000000 00000000 80000000 OK input [00000000] 00000000 [00000000] 00000000 hard mul 00000000 00000000 00000000 00000000 ... hard div FFFFFFFF 00000000 1B9D5F9C 38BAA671 soft div FFFFFFFF 00000000 1B9D5F9C 38BAA671 OK Cycle counter ......... 484187 Instruction counter ... 105596 CPI: 4.58 DONE ------------------------------------------------------------ EBREAK instruction at 0x0000072A pc 0000072D x8 00000000 x16 1B639DFB x24 00000000 x1 000006FC x9 00000000 x17 1B639DFB x25 00000000 x2 00020000 x10 20000000 x18 00000000 x26 00000000 x3 DEADBEEF x11 075BCD15 x19 00003A94 x27 00000000 x4 DEADBEEF x12 0000004F x20 00000000 x28 38BAA671 x5 0000108E x13 0000004E x21 00000000 x29 00000001 x6 00000000 x14 00000045 x22 00000000 x30 00000000 x7 00000000 x15 0000000A x23 00000000 x31 00000000 ------------------------------------------------------------ Number of fast external IRQs counted: 60 Number of slow external IRQs counted: 7 Number of timer IRQs counted: 22 Finished writing testbench.trace. TRAP after 526389 clock cycles ALL TESTS PASSED. ``` Run `gtkwave testbench.vcd` to check the wave file ![image](https://hackmd.io/_uploads/HyOQlG5D1l.png) ## RISC-V Bit-Manipulation ISA-Extensions This extension is intended to provide some combination of code size reduction, performance improvement, and energy reduction. According to different operation properties, it is divided into four categories, Zba, Zbb, Zbc and Zbs extension. The detail could be found in the [document](https://www.ece.lsu.edu/ee4720/doc/riscv-bitmanip-1.0.0.pdf). | Extension | Operation | | :-------: | :------------------------------: | | Zba | Address generation instructions | | Zbb | Basic bit-manipulation | | Zbc | Carry-less multiplication | | Zbs | Single-bit instructions | ### Zba Extension Zba extension for RV32 includes the following instructions: | Mnemonic | Instruction | Type | | -------- | ----------- | ---- | | sh1add rd, rs1, rs2 | Shift left by 1 and add | R-type | | sh2add rd, rs1, rs2 | Shift left by 2 and add | R-type | | sh3add rd, rs1, rs2 | Shift left by 3 and add | R-type | ### Zbb Extension Zbb extension for RV32 includes the following instructions: | Mnemonic | Instruction | Type | | -------- | ----------- | ---- | | andn rd, rs1, rs2 | AND with inverted operand | R-type | | orn rd, rs1, rs2 | OR with inverted operand | R-type | | xnor rd, rs1, rs2 | Exclusive OR | R-type | | max rd, rs1, rs2 | Maximum | R-type | | maxu rd, rs1, rs2 | Unsigned maximum | R-type | | min rd, rs1, rs2 | Minimum | R-type | | minu rd, rs1, rs2 | Unsigned minimum | R-type | | rol rd, rs1, rs2 | Rotate left (Register) | R-type | | ror rd, rs1, rs2 | Rotate right (Register) | R-type | | clz rd, rs | Count leading zero bits | I-type | | ctz rd, rs | Count trailing zero bits | I-type | | cpop rd, rs | Count set bits | I-type | | sext.b rd, rs | Sign-extend byte | I-type | | sext.h rd, rs | Sign-extend halfword | I-type | | zext_h rd, rs | Sign-extend halfword | I-type | | rori rd, rs | Rotate right (Immediate) | I-type | | orc.b rd, rs | Bitwise OR-Combine, byte granule | I-type | | rev8 rd, rs | Byte-reverse register | I-type | ### Zbc Extension | Mnemonic | Instruction | Type | | -------- | ----------- | ---- | | clmul rd, rs1, rs2 | Carry-less multiply (low-part)| R-type | | clmulh rd, rs1, rs2 | Carry-less multiply (high-part) | R-type | | clmulr rd, rs1, rs2 | Exclusive OR | R-type | ### Zbs Extension | Mnemonic | Instruction | Type | | -------- | ----------- | ---- | | bclr rd, rs1, rs2 | Single-Bit Clear (Register) | R-type | | bext rd, rs1, rs2 | Single-Bit Extract (Register) | R-type | | binv rd, rs1, rs2 | Single-Bit Invert (Register) | R-type | | bset rd, rs1, rs2 | Single-Bit Set (Register)| R-type | | bclri rd, rs1, imm | Single-Bit Clear (Immediate) | I-type | | bexti rd, rs1, imm | Single-Bit Extract (Immediate) | I-type | | binvi rd, rs1, imm | Single-Bit Invert (Immediate) | I-type | | bseti rd, rs1, imm | Single-Bit Set (Immediate) | I-type | ## Validation Download the official testbench from [riscv-tests](https://github.com/riscv-software-src/riscv-tests). In the `isa` folder, we can find the test cases for Zba, Zbb, Zbc, Zbs extensions (rv32uzba, rv32uzbb, rv32uzbc, rv32uzbs). Copy the assembly files into `picorv32/tests/` (Some files will need to modify from 64-bit version). Run `make test_vcd` to check whether the added instructions can be operate correctly. ### Zba Extension ```diff iverilog -o testbench.vvp -DCOMPRESSED_ISA testbench.v picorv32.v chmod -x testbench.vvp /opt/riscv32i/bin/riscv32-unknown-elf-gcc -c -mabi=ilp32 -march=rv32imc_zbb -o firmware/start.o firmware/start.S ... lui..OK auipc..OK ... remu..OK +sh1add..OK +sh2add..OK +sh3add..OK simple..OK ... 1st prime is 2. 2nd prime is 3. ... 31st prime is 127. ... Cycle counter ......... 467924 Instruction counter ... 102217 CPI: 4.57 DONE ------------------------------------------------------------ EBREAK instruction at 0x0000074E pc 00000751 x8 00000000 x16 F98C5E4E x24 00000000 x1 00000720 x9 00000000 x17 1B639DFB x25 00000000 x2 00020000 x10 20000000 x18 00000000 x26 00000000 x3 DEADBEEF x11 075BCD15 x19 00003A7C x27 00000000 x4 DEADBEEF x12 0000004F x20 00000000 x28 1B639DFB x5 0000107C x13 0000004E x21 00000000 x29 38BAA670 x6 1B639DFB x14 00000045 x22 00000000 x30 00000000 x7 00000000 x15 0000000A x23 00000000 x31 00000000 ------------------------------------------------------------ Number of fast external IRQs counted: 58 Number of slow external IRQs counted: 7 Number of timer IRQs counted: 25 Finished writing testbench.trace. TRAP after 509379 clock cycles ALL TESTS PASSED. ``` ### Zbb Extension ```diff iverilog -o testbench.vvp -DCOMPRESSED_ISA testbench.v picorv32.v chmod -x testbench.vvp /opt/riscv32i/bin/riscv32-unknown-elf-gcc -c -mabi=ilp32 -march=rv32imc_zbb -o firmware/start.o firmware/start.S /opt/riscv32i/bin/riscv32-unknown-elf-gcc -c -mabi=ilp32 -march=rv32ic_zbb -Os --std=c99 -Werror -Wall -Wextra -Wshadow -Wundef -Wpointer-arith -Wcast-qual -Wcast-align -Wwrite-strings -Wredundant-decls -Wstrict-prototypes -Wmissing-prototypes -pedantic -ffreestanding -nostdlib -o firmware/irq.o firmware/irq.c /opt/riscv32i/bin/riscv32-unknown-elf-gcc -c -mabi=ilp32 -march=rv32ic_zbb -Os --std=c99 -Werror -Wall -Wextra -Wshadow -Wundef -Wpointer-arith -Wcast-qual -Wcast-align -Wwrite-strings -Wredundant-decls -Wstrict-prototypes -Wmissing-prototypes -pedantic -ffreestanding -nostdlib -o firmware/print.o firmware/print.c /opt/riscv32i/bin/riscv32-unknown-elf-gcc -c -mabi=ilp32 -march=rv32ic_zbb -Os --std=c99 -Werror -Wall -Wextra -Wshadow -Wundef -Wpointer-arith -Wcast-qual -Wcast-align -Wwrite-strings -Wredundant-decls -Wstrict-prototypes -Wmissing-prototypes -pedantic -ffreestanding -nostdlib -o firmware/hello.o firmware/hello.c /opt/riscv32i/bin/riscv32-unknown-elf-gcc -c -mabi=ilp32 -march=rv32ic_zbb -Os --std=c99 -Werror -Wall -Wextra -Wshadow -Wundef -Wpointer-arith -Wcast-qual -Wcast-align -Wwrite-strings -Wredundant-decls -Wstrict-prototypes -Wmissing-prototypes -pedantic -ffreestanding -nostdlib -o firmware/sieve.o firmware/sieve.c /opt/riscv32i/bin/riscv32-unknown-elf-gcc -c -mabi=ilp32 -march=rv32ic_zbb -Os --std=c99 -Werror -Wall -Wextra -Wshadow -Wundef -Wpointer-arith -Wcast-qual -Wcast-align -Wwrite-strings -Wredundant-decls -Wstrict-prototypes -Wmissing-prototypes -pedantic -ffreestanding -nostdlib -o firmware/multest.o firmware/multest.c /opt/riscv32i/bin/riscv32-unknown-elf-gcc -c -mabi=ilp32 -march=rv32ic_zbb -Os --std=c99 -Werror -Wall -Wextra -Wshadow -Wundef -Wpointer-arith -Wcast-qual -Wcast-align -Wwrite-strings -Wredundant-decls -Wstrict-prototypes -Wmissing-prototypes -pedantic -ffreestanding -nostdlib -o firmware/stats.o firmware/stats.c /opt/riscv32i/bin/riscv32-unknown-elf-gcc -c -mabi=ilp32 -march=rv32im_zbb -o tests/addi.o -DTEST_FUNC_NAME=addi \ -DTEST_FUNC_TXT='"addi"' -DTEST_FUNC_RET=addi_ret tests/addi.S /opt/riscv32i/bin/riscv32-unknown-elf-gcc -c -mabi=ilp32 -march=rv32im_zbb -o tests/add.o -DTEST_FUNC_NAME=add \ -DTEST_FUNC_TXT='"add"' -DTEST_FUNC_RET=add_ret tests/add.S ... /opt/riscv32i/bin/riscv32-unknown-elf-gcc -c -mabi=ilp32 -march=rv32im_zbb -o tests/zext_h.o -DTEST_FUNC_NAME=zext_h \ -DTEST_FUNC_TXT='"zext_h"' -DTEST_FUNC_RET=zext_h_ret tests/zext_h.S /opt/riscv32i/bin/riscv32-unknown-elf-gcc -Os -mabi=ilp32 -march=rv32imc -ffreestanding -nostdlib -o firmware/firmware.elf \ -Wl,--build-id=none,-Bstatic,-T,firmware/sections.lds,-Map,firmware/firmware.map,--strip-debug \ firmware/start.o firmware/irq.o firmware/print.o firmware/hello.o firmware/sieve.o firmware/multest.o firmware/stats.o tests/addi.o tests/add.o tests/andi.o tests/andn.o tests/and.o tests/auipc.o tests/beq.o tests/bge.o tests/bgeu.o tests/blt.o tests/bltu.o tests/bne.o tests/clz.o tests/cpop.o tests/ctz.o tests/div.o tests/divu.o tests/jalr.o tests/jal.o tests/j.o tests/lb.o tests/lbu.o tests/lh.o tests/lhu.o tests/lui.o tests/lw.o tests/max.o tests/maxu.o tests/min.o tests/minu.o tests/mulh.o tests/mulhsu.o tests/mulhu.o tests/mul.o tests/orc_b.o tests/ori.o tests/orn.o tests/or.o tests/rem.o tests/remu.o tests/rev8.o tests/rol.o tests/rori.o tests/ror.o tests/sb.o tests/sext_b.o tests/sext_h.o tests/sh.o tests/simple.o tests/slli.o tests/sll.o tests/slti.o tests/slt.o tests/srai.o tests/sra.o tests/srli.o tests/srl.o tests/sub.o tests/sw.o tests/xnor.o tests/xori.o tests/xor.o tests/zext_h.o -lgcc /opt/riscv32i/lib/gcc/riscv32-unknown-elf/14.2.0/../../../../riscv32-unknown-elf/bin/ld: warning: firmware/firmware.elf has a LOAD segment with RWX permissions chmod -x firmware/firmware.elf /opt/riscv32i/bin/riscv32-unknown-elf-objcopy -O binary firmware/firmware.elf firmware/firmware.bin chmod -x firmware/firmware.bin python3 firmware/makehex.py firmware/firmware.bin 32768 > firmware/firmware.hex vvp -N testbench.vvp +vcd +trace +noerror VCD info: dumpfile testbench.vcd opened for output. hello world lui..OK auipc..OK ... simple..OK +andn..OK +orn..OK +xnor..OK +max..OK +maxu..OK +min..OK +minu..OK +rol..OK +ror..OK +clz..OK +ctz..OK +cpop..OK +sext_b..OK +sext_h..OK +zext_h..OK +rori..OK +orc_b..OK +rev8..OK 1st prime is 2. 2nd prime is 3. ... 31st prime is 127. checksum: 1772A48F OK input [FFFFFFFF] 80000000 [FFFFFFFF] FFFFFFFF hard mul 80000000 00000000 80000000 7FFFFFFF soft mul 80000000 00000000 80000000 7FFFFFFF OK ... hard div FFFFFFFF 00000000 1B9D5F9C 38BAA671 soft div FFFFFFFF 00000000 1B9D5F9C 38BAA671 OK Cycle counter ......... 512131 Instruction counter ... 113483 CPI: 4.51 DONE ------------------------------------------------------------ EBREAK instruction at 0x00000802 pc 00000805 x8 00000000 x16 1B639DFB x24 00000000 x1 000007D4 x9 00000000 x17 1B639DFB x25 00000000 x2 00020000 x10 20000000 x18 00000000 x26 00000000 x3 DEADBEEF x11 075BCD15 x19 0000484C x27 00000000 x4 DEADBEEF x12 0000004F x20 00000000 x28 38BAA671 x5 0000108E x13 0000004E x21 00000000 x29 00000001 x6 00000000 x14 00000045 x22 00000000 x30 00000000 x7 00000000 x15 0000000A x23 00000000 x31 00000000 ------------------------------------------------------------ Number of fast external IRQs counted: 64 Number of slow external IRQs counted: 8 Number of timer IRQs counted: 32 Finished writing testbench.trace. TRAP after 555053 clock cycles ALL TESTS PASSED. ``` ### Zbc Extension ```diff! iverilog -o testbench.vvp -DCOMPRESSED_ISA testbench.v picorv32.v chmod -x testbench.vvp /opt/riscv32i/bin/riscv32-unknown-elf-gcc -c -mabi=ilp32 -march=rv32imc_zba_zbc_zbs -o firmware/start.o firmware/start.S ... /opt/riscv32i/bin/riscv32-unknown-elf-gcc -c -mabi=ilp32 -march=rv32im_zba_zbc_zbs -o tests/addi.o -DTEST_FUNC_NAME=addi \ -DTEST_FUNC_TXT='"addi"' -DTEST_FUNC_RET=addi_ret tests/addi.S /opt/riscv32i/bin/riscv32-unknown-elf-gcc -c -mabi=ilp32 -march=rv32im_zba_zbc_zbs -o tests/add.o -DTEST_FUNC_NAME=add \ -DTEST_FUNC_TXT='"add"' -DTEST_FUNC_RET=add_ret tests/add.S ... /opt/riscv32i/bin/riscv32-unknown-elf-gcc -c -mabi=ilp32 -march=rv32im_zba_zbc_zbs -o tests/clmulh.o -DTEST_FUNC_NAME=clmulh \ -DTEST_FUNC_TXT='"clmulh"' -DTEST_FUNC_RET=clmulh_ret tests/clmulh.S /opt/riscv32i/bin/riscv32-unknown-elf-gcc -c -mabi=ilp32 -march=rv32im_zba_zbc_zbs -o tests/clmulr.o -DTEST_FUNC_NAME=clmulr \ -DTEST_FUNC_TXT='"clmulr"' -DTEST_FUNC_RET=clmulr_ret tests/clmulr.S /opt/riscv32i/bin/riscv32-unknown-elf-gcc -c -mabi=ilp32 -march=rv32im_zba_zbc_zbs -o tests/clmul.o -DTEST_FUNC_NAME=clmul \ -DTEST_FUNC_TXT='"clmul"' -DTEST_FUNC_RET=clmul_ret tests/clmul.S /opt/riscv32i/bin/riscv32-unknown-elf-gcc -c -mabi=ilp32 -march=rv32im_zba_zbc_zbs -o tests/div.o -DTEST_FUNC_NAME=div \ ... /opt/riscv32i/bin/riscv32-unknown-elf-gcc -c -mabi=ilp32 -march=rv32im_zba_zbc_zbs -o tests/xor.o -DTEST_FUNC_NAME=xor \ -DTEST_FUNC_TXT='"xor"' -DTEST_FUNC_RET=xor_ret tests/xor.S /opt/riscv32i/bin/riscv32-unknown-elf-gcc -Os -mabi=ilp32 -march=rv32imc_zba_zbc_zbs -ffreestanding -nostdlib -o firmware/firmware.elf \ -Wl,--build-id=none,-Bstatic,-T,firmware/sections.lds,-Map,firmware/firmware.map,--strip-debug \ firmware/start.o firmware/irq.o firmware/print.o firmware/hello.o firmware/sieve.o firmware/multest.o firmware/stats.o tests/addi.o tests/add.o tests/andi.o tests/and.o tests/auipc.o tests/bclri.o tests/bclr.o tests/beq.o tests/bexti.o tests/bext.o tests/bge.o tests/bgeu.o tests/binvi.o tests/binv.o tests/blt.o tests/bltu.o tests/bne.o tests/bseti.o tests/bset.o tests/clmulh.o tests/clmulr.o tests/clmul.o tests/div.o tests/divu.o tests/jalr.o tests/jal.o tests/j.o tests/lb.o tests/lbu.o tests/lh.o tests/lhu.o tests/lui.o tests/lw.o tests/mulh.o tests/mulhsu.o tests/mulhu.o tests/mul.o tests/ori.o tests/or.o tests/rem.o tests/remu.o tests/sb.o tests/sh1add.o tests/sh2add.o tests/sh3add.o tests/sh.o tests/simple.o tests/slli.o tests/sll.o tests/slti.o tests/slt.o tests/srai.o tests/sra.o tests/srli.o tests/srl.o tests/sub.o tests/sw.o tests/xori.o tests/xor.o -lgcc chmod -x firmware/firmware.elf /opt/riscv32i/bin/riscv32-unknown-elf-objcopy -O binary firmware/firmware.elf firmware/firmware.bin chmod -x firmware/firmware.bin python3 firmware/makehex.py firmware/firmware.bin 32768 > firmware/firmware.hex vvp -N testbench.vvp +vcd +trace +noerror VCD info: dumpfile testbench.vcd opened for output. hello world lui..OK auipc..OK ... binv..OK binvi..OK bext..OK bexti..OK bset..OK bseti..OK +clmul..OK +clmulh..OK +clmulr..OK simple..OK 1st prime is 2. 2nd prime is 3. ... 30th prime is 113. 31st prime is 127. checksum: 1772A48F OK input [FFFFFFFF] 80000000 [FFFFFFFF] FFFFFFFF hard mul 80000000 00000000 80000000 7FFFFFFF soft mul 80000000 00000000 80000000 7FFFFFFF OK hard div 80000000 00000000 00000000 80000000 soft div 80000000 00000000 00000000 80000000 OK ... hard div FFFFFFFF 00000000 1B9D5F9C 38BAA671 soft div FFFFFFFF 00000000 1B9D5F9C 38BAA671 OK Cycle counter ......... 488750 Instruction counter ... 108076 CPI: 4.52 DONE ------------------------------------------------------------ EBREAK instruction at 0x000007D2 pc 000007D5 x8 00000000 x16 F98C5E4E x24 00000000 x1 000007A4 x9 00000000 x17 1B639DFB x25 00000000 x2 00020000 x10 20000000 x18 00000000 x26 00000000 x3 DEADBEEF x11 075BCD15 x19 000070F0 x27 00000000 x4 DEADBEEF x12 0000004F x20 00000000 x28 1B639DFB x5 0000107C x13 0000004E x21 00000000 x29 38BAA670 x6 1B639DFB x14 00000045 x22 00000000 x30 00000000 x7 00000000 x15 0000000A x23 00000000 x31 00000000 ------------------------------------------------------------ Number of fast external IRQs counted: 61 Number of slow external IRQs counted: 7 Number of timer IRQs counted: 33 Finished writing testbench.trace. TRAP after 530878 clock cycles ALL TESTS PASSED. ``` ### Zbs Extension ```diff iverilog -o testbench.vvp -DCOMPRESSED_ISA testbench.v picorv32.v chmod -x testbench.vvp /opt/riscv32i/bin/riscv32-unknown-elf-gcc -c -mabi=ilp32 -march=rv32imc_zbs -o firmware/start.o firmware/start.S /opt/riscv32i/bin/riscv32-unknown-elf-gcc -c -mabi=ilp32 -march=rv32imc_zbs -Os --std=c99 -Werror -Wall -Wextra -Wshadow -Wundef -Wpointer-arith -Wcast-qual -Wcast-align -Wwrite-strings -Wredundant-decls -Wstrict-prototypes -Wmissing-prototypes -pedantic -ffreestanding -nostdlib -o firmware/irq.o firmware/irq.c /opt/riscv32i/bin/riscv32-unknown-elf-gcc -c -mabi=ilp32 -march=rv32imc_zbs -Os --std=c99 -Werror -Wall -Wextra -Wshadow -Wundef -Wpointer-arith -Wcast-qual -Wcast-align -Wwrite-strings -Wredundant-decls -Wstrict-prototypes -Wmissing-prototypes -pedantic -ffreestanding -nostdlib -o firmware/print.o firmware/print.c /opt/riscv32i/bin/riscv32-unknown-elf-gcc -c -mabi=ilp32 -march=rv32imc_zbs -Os --std=c99 -Werror -Wall -Wextra -Wshadow -Wundef -Wpointer-arith -Wcast-qual -Wcast-align -Wwrite-strings -Wredundant-decls -Wstrict-prototypes -Wmissing-prototypes -pedantic -ffreestanding -nostdlib -o firmware/hello.o firmware/hello.c /opt/riscv32i/bin/riscv32-unknown-elf-gcc -c -mabi=ilp32 -march=rv32imc_zbs -Os --std=c99 -Werror -Wall -Wextra -Wshadow -Wundef -Wpointer-arith -Wcast-qual -Wcast-align -Wwrite-strings -Wredundant-decls -Wstrict-prototypes -Wmissing-prototypes -pedantic -ffreestanding -nostdlib -o firmware/sieve.o firmware/sieve.c /opt/riscv32i/bin/riscv32-unknown-elf-gcc -c -mabi=ilp32 -march=rv32imc_zbs -Os --std=c99 -Werror -Wall -Wextra -Wshadow -Wundef -Wpointer-arith -Wcast-qual -Wcast-align -Wwrite-strings -Wredundant-decls -Wstrict-prototypes -Wmissing-prototypes -pedantic -ffreestanding -nostdlib -o firmware/multest.o firmware/multest.c /opt/riscv32i/bin/riscv32-unknown-elf-gcc -c -mabi=ilp32 -march=rv32imc_zbs -Os --std=c99 -Werror -Wall -Wextra -Wshadow -Wundef -Wpointer-arith -Wcast-qual -Wcast-align -Wwrite-strings -Wredundant-decls -Wstrict-prototypes -Wmissing-prototypes -pedantic -ffreestanding -nostdlib -o firmware/stats.o firmware/stats.c /opt/riscv32i/bin/riscv32-unknown-elf-gcc -c -mabi=ilp32 -march=rv32im_zbs -o tests/addi.o -DTEST_FUNC_NAME=addi \ -DTEST_FUNC_TXT='"addi"' -DTEST_FUNC_RET=addi_ret tests/addi.S /opt/riscv32i/bin/riscv32-unknown-elf-gcc -c -mabi=ilp32 -march=rv32im_zbs -o tests/add.o -DTEST_FUNC_NAME=add \ -DTEST_FUNC_TXT='"add"' -DTEST_FUNC_RET=add_ret tests/add.S ... /opt/riscv32i/bin/riscv32-unknown-elf-gcc -Os -mabi=ilp32 -march=rv32imc_zbs -ffreestanding -nostdlib -o firmware/firmware.elf \ -Wl,--build-id=none,-Bstatic,-T,firmware/sections.lds,-Map,firmware/firmware.map,--strip-debug \ firmware/start.o firmware/irq.o firmware/print.o firmware/hello.o firmware/sieve.o firmware/multest.o firmware/stats.o tests/addi.o tests/add.o tests/andi.o tests/and.o tests/auipc.o tests/bclri.o tests/bclr.o tests/beq.o tests/bexti.o tests/bext.o tests/bge.o tests/bgeu.o tests/binvi.o tests/binv.o tests/blt.o tests/bltu.o tests/bne.o tests/bseti.o tests/bset.o tests/div.o tests/divu.o tests/jalr.o tests/jal.o tests/j.o tests/lb.o tests/lbu.o tests/lh.o tests/lhu.o tests/lui.o tests/lw.o tests/mulh.o tests/mulhsu.o tests/mulhu.o tests/mul.o tests/ori.o tests/or.o tests/rem.o tests/remu.o tests/sb.o tests/sh.o tests/simple.o tests/slli.o tests/sll.o tests/slti.o tests/slt.o tests/srai.o tests/sra.o tests/srli.o tests/srl.o tests/sub.o tests/sw.o tests/xori.o tests/xor.o -lgcc /opt/riscv32i/lib/gcc/riscv32-unknown-elf/14.2.0/../../../../riscv32-unknown-elf/bin/ld: warning: firmware/firmware.elf has a LOAD segment with RWX permissions chmod -x firmware/firmware.elf /opt/riscv32i/bin/riscv32-unknown-elf-objcopy -O binary firmware/firmware.elf firmware/firmware.bin chmod -x firmware/firmware.bin python3 firmware/makehex.py firmware/firmware.bin 32768 > firmware/firmware.hex vvp -N testbench.vvp +vcd +trace +noerror VCD info: dumpfile testbench.vcd opened for output. hello world lui..OK auipc..OK ... simple..OK +bclr..OK +bext..OK +binv..OK +bset..OK +bclri..OK +bexti..OK +binvi..OK +bseti..OK 1st prime is 2. 2nd prime is 3. ... 31st prime is 127. checksum: 1772A48F OK input [FFFFFFFF] 80000000 [FFFFFFFF] FFFFFFFF hard mul 80000000 00000000 80000000 7FFFFFFF soft mul 80000000 00000000 80000000 7FFFFFFF OK ... soft mul 1B639DFB F98C5E4E 324704BF 324704BF OK hard div FFFFFFFF 00000000 1B9D5F9C 38BAA671 soft div FFFFFFFF 00000000 1B9D5F9C 38BAA671 OK Cycle counter ......... 400810 Instruction counter .... 82641 CPI: 4.85 DONE ------------------------------------------------------------ EBREAK instruction at 0x0000078A pc 0000078D x8 00000000 x16 00000000 x24 00000000 x1 0000075C x9 00000000 x17 00000000 x25 00000000 x2 00020000 x10 20000000 x18 00000000 x26 00000000 x3 DEADBEEF x11 075BCD15 x19 00006098 x27 00000000 x4 DEADBEEF x12 0000004F x20 00000000 x28 00000019 x5 0000E3B0 x13 0000004E x21 00000000 x29 00000000 x6 FF00FF00 x14 00000045 x22 00000000 x30 00000000 x7 00000000 x15 0000000A x23 00000000 x31 00000000 ------------------------------------------------------------ Number of fast external IRQs counted: 49 Number of slow external IRQs counted: 6 Number of timer IRQs counted: 26 Finished writing testbench.trace. TRAP after 434128 clock cycles ALL TESTS PASSED. ``` ### Validation clz We performed validation on the **CLZ instruction in the B extension** and compared it with the **CLZ implementation in RV32I**. **CLZ C implement** Below is the optimized CLZ function in C code: ```c static inline int clz32(uint32_t x) { int n = 0; int mask = (x & 0xFFFF0000) == 0; n += mask * 16; x <<= mask * 16; mask = (x & 0xFF000000) == 0; n += mask * 8; x <<= mask * 8; mask = (x & 0xF0000000) == 0; n += mask * 4; x <<= mask * 4; mask = (x & 0xC0000000) == 0; n += mask * 2; x <<= mask * 2; mask = (x & 0x80000000) == 0; n += mask * 1; n += (x == 0) ? (32 - n) : 0; return n; } ``` **CLZ assembly** And the following is the assembly of clz c code accordingly: ```c clz32: addi sp, sp, -16 # Allocate stack space sw s0, 0(sp) # Save s0 sw s1, 4(sp) # Save s1 li s0, 0 # Initialize n = 0 li s1, 0xFFFF0000 # Check if the upper 16 bits are 0 and t2, a0, s1 bnez t2, skip_16 # If the upper 16 bits are not 0, skip addi s0, s0, 16 slli a0, a0, 16 # x <<= 16 skip_16: li s1, 0xFF000000 # Check if the upper 8 bits are 0 and t2, a0, s1 bnez t2, skip_8 addi s0, s0, 8 slli a0, a0, 8 skip_8: li s1, 0xF0000000 # Check if the upper 4 bits are 0 and t2, a0, s1 bnez t2, skip_4 addi s0, s0, 4 slli a0, a0, 4 skip_4: li s1, 0xC0000000 # Check if the upper 2 bits are 0 and t2, a0, s1 bnez t2, skip_2 addi s0, s0, 2 slli a0, a0, 2 skip_2: li s1, 0x80000000 # Check if the highest bit is 0 and t2, a0, s1 bnez t2, end_clz addi s0, s0, 1 end_clz: mv a0, s0 # Return n lw s0, 0(sp) # Restore s0 lw s1, 4(sp) # Restore s1 addi sp, sp, 16 # Reclaim stack space ret ``` **Validation result** We implemented clztest.c to peform and check cycles in these two situations. ```c /* clztest.c in picorv32/firmware */ #include "firmware.h" int clz32(uint32_t x) { int n = 0; // Check if the upper 16 bits are 0 int mask = ((x & 0xFFFF0000) == 0); n += mask << 4; // mask * 16 x <<= mask << 4; // x <<= (mask * 16) // Check if the upper 8 bits are 0 mask = ((x & 0xFF000000) == 0); n += mask << 3; // mask * 8 x <<= mask << 3; // x <<= (mask * 8) // Check if the upper 4 bits are 0 mask = ((x & 0xF0000000) == 0); n += mask << 2; // mask * 4 x <<= mask << 2; // x <<= (mask * 4) // Check if the upper 2 bits are 0 mask = ((x & 0xC0000000) == 0); n += mask << 1; // mask * 2 x <<= mask << 1; // x <<= (mask * 2) // Check if the highest bit is 0 mask = ((x & 0x80000000) == 0); n += mask; // mask * 1 // If x is 0, return 32, otherwise return n return (x == 0) ? 32 : n; } void clz_software_test(uint32_t input) { unsigned int start_cycles, end_cycles; __asm__ volatile ("rdcycle %0" : "=r"(start_cycles)); uint32_t result = clz32(input); __asm__ volatile ("rdcycle %0" : "=r"(end_cycles)); print_str("Software CLZ: "); print_hex(result, 2); print_str("\n"); print_str("Cycles: "); print_dec(end_cycles - start_cycles); print_str("\n"); } void clz_hardware_test(uint32_t input) { unsigned int start_cycles, end_cycles; __asm__ volatile ("rdcycle %0" : "=r"(start_cycles)); uint32_t result = hard_clz(input); __asm__ volatile ("rdcycle %0" : "=r"(end_cycles)); print_str("Hardware CLZ: "); print_hex(result, 2); print_str("\n"); print_str("Cycles: "); print_dec(end_cycles - start_cycles); print_str("\n"); } void clztest(void) { uint32_t test_data[3] = {0xFFFFFFFF, 0x00000000, 0x0EAB1234}; for (int i = 0; i < 3; i++) { print_str("Input: "); print_hex(test_data[i], 8); print_str("\n"); clz_software_test(test_data[i]); clz_hardware_test(test_data[i]); print_str("\n"); } } ``` And below is the validation result, which shows the cycles of 5 clz examples. ```diff= Input: FFFFFFFF RV32i implemented CLZ: 0 Cycles: 116 RV32i_zbb CLZ: 0 Cycles: 20 Input: 00000000 RV32i implemented CLZ: 32 Cycles: 101 RV32i_zbb CLZ: 32 Cycles: 20 Input: 0EAB1234 RV32i implemented CLZ: 4 Cycles: 117 RV32i_zbb CLZ: 4 Cycles: 20 Input: 000ABCDE RV32i implemented CLZ: 12 Cycles: 119 RV32i_zbb CLZ: 12 Cycles: 20 Input: 000000AA RV32i implemented CLZ: 24 Cycles: 122 RV32i_zbb CLZ: 24 Cycles: 20 DONE ``` ## Reference 1. [PicoRV32](https://github.com/YosysHQ/picorv32) 2. [PicoRV32 RTL 驗證環境搭建](https://blog.csdn.net/tugouxp/article/details/127333649) 3. [RISC-v Bit-Manipulation ISA-Extension](https://www.ece.lsu.edu/ee4720/doc/riscv-bitmanip-1.0.0.pdf) 4. [riscv-tests](https://github.com/riscv-software-src/riscv-tests)