# Rust ISA benchmarks Codebase: https://github.com/aborg-dev/isa_benchmarks Index document: https://hackmd.io/@aborg-dev/H1JgUcGCR This document describes the set of experiments to find the best ISA that we can use as an intermediate representation when compiling Rust code to Zisk. This involves: - Selecting a [set of benchmarks](https://github.com/aborg-dev/isa_benchmarks/tree/main/src/bin) and implementing them in Rust - Compiling the resulting programs to a range of ISAs supported by Rust compiler with right optimization flags - Running the resulting binaries in QEMU and measuring the execution costs in the number and complexity of instructions ## Results ### Summary for Ethereum block - rustc -> x86: 250M steps (with vectorized instr 2.5M) - rustc -> aarch64: 191M steps - rustc -> riscv64gc: 320M steps - ziskemu steps match qemu instruction counter: 320M steps - `rustc_codegen_cranelift` is around 8x slower with default settings - Code is indeed much less optimized, needs inlining, loop unrolling, etc. - rustc -> WASM -> x86: 600M steps - rustc -> WASM -> w2c2 -> C -> x86: 376M steps ### Remaining steps - Understand why "rustc -> x86" is so fast for SHA 256 - So far seems to be due to more complex instructions which leads to less instructions used overall - Understand why rust -> WASM -> x86 is 10x-15x slower - Is inlining to blame for the results? No, it is properly inlined - Or do we need to example WASM vectorized instructions? Not really, I disabled all of these - Also could be due to WASM interpreter overhead - the most likely one, need to verify - Measure WASM -> Cranelift -> RISC-V path - If we are at the same level, might be a viable path ## Setup - QEMU version 9.1 - https://github.com/rust-lang/rustc_codegen_cranelift - Rustup setup: ``` rustup target add riscv64gc-unknown-linux-gnu rustup target add aarch64-unknown-linux-gnu ``` ### Profiling Profiling commands: ```shell # For RISC-V qemu-riscv64 -L /usr/riscv64-linux-gnu -cpu rv64 \ -plugin tests/tcg/plugins/libinsn.so -d plugin \ sha_hasher # For ziskos ziskemu -i build/input.bin -xm -e sha_hasher # For x86 qemu-x86_64 \ -plugin tests/tcg/plugins/libinsn.so -d plugin \ sha_hasher # For ARM qemu-aarch64 \ -plugin tests/tcg/plugins/libinsn.so -d plugin \ sha_hasher ``` ## Raw data ### rustc -> riscv64ima-polygon-ziskos-elf, ziskemu Release: ``` n:10000 [82, 229, 228, 9, 207, 11, 252, 118, 235, 27, 13, 44, 75, 164, 54, 106, 253, 126, 193, 14, 54, 32, 188, 119, 81, 120, 47, 45, 222, 206, 161, 159] Cost definitions: AREA_PER_SEC: 1000000 steps COST_MEMA_R1: 0.00002 sec COST_MEMA_R2: 0.00004 sec COST_MEMA_W1: 0.00004 sec COST_MEMA_W2: 0.00008 sec COST_USUAL: 0.000008 sec COST_STEP: 0.00005 sec Total Cost: 6392.55 sec Main Cost: 2122.73 sec 42454507 steps Mem Cost: 1107.83 sec 110782774 steps Mem Align: 21.22 sec 1061034 steps Opcodes: 3125.94 sec 1432 steps (40600544 ops) Usual: 14.83 sec 1853964 steps Memory: 67849727 a reads + 680496 na1 reads + 0 na2 reads + 42062282 a writes + 190269 na1 writes + 0 na2 writes = 68530223 reads + 42252551 writes = 110782774 r/w Registy: 65366939 reads + 40389901 writes = 105756840 r/w Opcodes: flag: 0.00 sec (0 steps/op) (40583 ops) copyb: 0.00 sec (0 steps/op) (5266028 ops) add: 553.92 sec (77 steps/op) (7193707 ops) sub: 0.00 sec (77 steps/op) (11 ops) ltu: 6.95 sec (77 steps/op) (90237 ops) eq: 1.54 sec (77 steps/op) (19953 ops) sll: 620.18 sec (109 steps/op) (5689699 ops) srl: 8.73 sec (109 steps/op) (80061 ops) add_w: 0.00 sec (77 steps/op) (55 ops) sub_w: 0.77 sec (77 steps/op) (10005 ops) srl_w: 718.31 sec (109 steps/op) (6589961 ops) and: 168.70 sec (77 steps/op) (2190887 ops) or: 471.96 sec (77 steps/op) (6129357 ops) xor: 531.30 sec (77 steps/op) (6899979 ops) signextend_b: 17.44 sec (109 steps/op) (160000 ops) signextend_w: 26.16 sec (109 steps/op) (240000 ops) mul: 0.00 sec (97 steps/op) (20 ops) muluh: 0.00 sec (97 steps/op) (1 ops) process_rom() steps=42454508 duration=0.3956 tp=107.3291 Msteps/s freq=2209.0000 20.5815 clocks/step ``` Debug (failed): ``` Cost definitions: AREA_PER_SEC: 1000000 steps COST_MEMA_R1: 0.00002 sec COST_MEMA_R2: 0.00004 sec COST_MEMA_W1: 0.00004 sec COST_MEMA_W2: 0.00008 sec COST_USUAL: 0.000008 sec COST_STEP: 0.00005 sec Total Cost: 6193.56 sec Main Cost: 0.00 sec 0 steps Mem Cost: 2492.99 sec 249299312 steps Mem Align: 351.37 sec 17568549 steps Opcodes: 3279.52 sec 1695 steps (91290009 ops) Usual: 69.68 sec 8709991 steps Memory: 153039236 a reads + 5137271 na1 reads + 0 na2 reads + 84907166 a writes + 6215639 na1 writes + 0 na2 writes = 158176507 reads + 91122805 writes = 249299312 r/w Registy: 135842625 reads + 61924758 writes = 197767383 r/w Opcodes: flag: 0.00 sec (0 steps/op) (5436123 ops) copyb: 0.00 sec (0 steps/op) (47777178 ops) add: 953.99 sec (77 steps/op) (12389493 ops) sub: 4.54 sec (77 steps/op) (59012 ops) ltu: 110.80 sec (77 steps/op) (1439009 ops) lt: 2.20 sec (77 steps/op) (28608 ops) eq: 16.39 sec (77 steps/op) (212832 ops) sll: 281.81 sec (109 steps/op) (2585452 ops) srl: 158.03 sec (109 steps/op) (1449818 ops) add_w: 140.57 sec (77 steps/op) (1825616 ops) sub_w: 7.99 sec (77 steps/op) (103724 ops) sll_w: 115.75 sec (109 steps/op) (1061885 ops) srl_w: 104.07 sec (109 steps/op) (954784 ops) and: 480.65 sec (77 steps/op) (6242245 ops) or: 286.44 sec (77 steps/op) (3720043 ops) xor: 100.09 sec (77 steps/op) (1299881 ops) signextend_b: 6.24 sec (109 steps/op) (57220 ops) signextend_w: 486.26 sec (109 steps/op) (4461098 ops) mul: 10.93 sec (97 steps/op) (112665 ops) divu: 12.76 sec (174 steps/op) (73323 ops) Error during emulation: EmulationNoCompleted ``` Crashes under QEMU: ``` fish: Job SIGSEGV, 'Address boundary error' terminated by signal () ``` ### rustc -> riscv64gc-unknown-linux-gnu Release: ``` n:10000 [82, 229, 228, 9, 207, 11, 252, 118, 235, 27, 13, 44, 75, 164, 54, 106, 253, 126, 193, 14, 54, 32, 188, 119, 81, 120, 47, 45, 222, 206, 161, 159] cpu 0 insns: 42868656 total insns: 42868656 ``` Debug: ``` n:10000 [82, 229, 228, 9, 207, 11, 252, 118, 235, 27, 13, 44, 75, 164, 54, 106, 253, 126, 193, 14, 54, 32, 188, 119, 81, 120, 47, 45, 222, 206, 161, 159] cpu 0 insns: 555737110 total insns: 555737110 ``` ### rustc_codegen_cranelift -> riscv64gc-unknown-linux-gnu Release: ``` n:10000 [82, 229, 228, 9, 207, 11, 252, 118, 235, 27, 13, 44, 75, 164, 54, 106, 253, 126, 193, 14, 54, 32, 188, 119, 81, 120, 47, 45, 222, 206, 161, 159] cpu 0 insns: 342801902 total insns: 342801902 ``` Debug: ``` n:10000 [82, 229, 228, 9, 207, 11, 252, 118, 235, 27, 13, 44, 75, 164, 54, 106, 253, 126, 193, 14, 54, 32, 188, 119, 81, 120, 47, 45, 222, 206, 161, 159] cpu 0 insns: 1065241694 total insns: 1065241694 ``` ### rustc_codegen_cranelift -> x86 Release: ``` n:10000 [82, 229, 228, 9, 207, 11, 252, 118, 235, 27, 13, 44, 75, 164, 54, 106, 253, 126, 193, 14, 54, 32, 188, 119, 81, 120, 47, 45, 222, 206, 161, 159] cpu 0 insns: 380275949 total insns: 380275949 ``` ### Rustc -> WASM -> CLIF -> x86 ``` qemu-x86_64 \ -plugin tests/tcg/plugins/libinsn.so -d plugin \ ./target/release/wasmtime run -O opt-level=0 -C cache=yes ../hello_world_nozisk/target/wasm32-wasip1/release/sha_hasher.wasm ``` ``` n:10000 [82, 229, 228, 9, 207, 11, 252, 118, 235, 27, 13, 44, 75, 164, 54, 106, 253, 126, 193, 14, 54, 32, 188, 119, 81, 120, 47, 45, 222, 206, 161, 159] cpu 0 insns: 45184403 total insns: 45693833 ``` ### Rustc -> WASM -> CLIF -> RISC-V :construction: ### rustc -> x86 Release: ``` n:10000 [82, 229, 228, 9, 207, 11, 252, 118, 235, 27, 13, 44, 75, 164, 54, 106, 253, 126, 193, 14, 54, 32, 188, 119, 81, 120, 47, 45, 222, 206, 161, 159] cpu 0 insns: 2998513 total insns: 2998513 ``` Debug: ``` n:10000 [82, 229, 228, 9, 207, 11, 252, 118, 235, 27, 13, 44, 75, 164, 54, 106, 253, 126, 193, 14, 54, 32, 188, 119, 81, 120, 47, 45, 222, 206, 161, 159] cpu 0 insns: 342708134 total insns: 342708134 ``` ### rustc -> aarch64 Release: ``` n:10000 [82, 229, 228, 9, 207, 11, 252, 118, 235, 27, 13, 44, 75, 164, 54, 106, 253, 126, 193, 14, 54, 32, 188, 119, 81, 120, 47, 45, 222, 206, 161, 159] cpu 0 insns: 20349879 total insns: 20349879 ``` Debug: ``` n:10000 [82, 229, 228, 9, 207, 11, 252, 118, 235, 27, 13, 44, 75, 164, 54, 106, 253, 126, 193, 14, 54, 32, 188, 119, 81, 120, 47, 45, 222, 206, 161, 159] cpu 0 insns: 417514369 total insns: 417514369 ```