Try   HackMD

Rust ISA benchmarks

Codebase: https://github.com/aborg-dev/isa_benchmarks
Index document: https://hackmd.io/@aborg-dev/H1JgUcGCR

This document describes the set of experiments to find the best ISA that we can use as an intermediate representation when compiling Rust code to Zisk.

This involves:

  • Selecting a set of benchmarks and implementing them in Rust
  • Compiling the resulting programs to a range of ISAs supported by Rust compiler with right optimization flags
  • Running the resulting binaries in QEMU and measuring the execution costs in the number and complexity of instructions

Results

Summary for Ethereum block

  • rustc -> x86: 250M steps (with vectorized instr 2.5M)
  • rustc -> aarch64: 191M steps
  • rustc -> riscv64gc: 320M steps
    • ziskemu steps match qemu instruction counter: 320M steps
    • rustc_codegen_cranelift is around 8x slower with default settings
      • Code is indeed much less optimized, needs inlining, loop unrolling, etc.
  • rustc -> WASM -> x86: 600M steps
  • rustc -> WASM -> w2c2 -> C -> x86: 376M steps

Remaining steps

  • Understand why "rustc -> x86" is so fast for SHA 256
    • So far seems to be due to more complex instructions which leads to less instructions used overall
  • Understand why rust -> WASM -> x86 is 10x-15x slower
    • Is inlining to blame for the results? No, it is properly inlined
    • Or do we need to example WASM vectorized instructions? Not really, I disabled all of these
    • Also could be due to WASM interpreter overhead - the most likely one, need to verify
  • Measure WASM -> Cranelift -> RISC-V path
    • If we are at the same level, might be a viable path

Setup

rustup target add riscv64gc-unknown-linux-gnu
rustup target add aarch64-unknown-linux-gnu

Profiling

Profiling commands:

# For RISC-V
qemu-riscv64 -L /usr/riscv64-linux-gnu -cpu rv64 \
    -plugin tests/tcg/plugins/libinsn.so -d plugin \
    sha_hasher

# For ziskos
ziskemu -i build/input.bin -xm -e sha_hasher

# For x86
qemu-x86_64 \
    -plugin tests/tcg/plugins/libinsn.so -d plugin \
    sha_hasher
    
# For ARM
qemu-aarch64 \
    -plugin tests/tcg/plugins/libinsn.so -d plugin \
    sha_hasher

Raw data

rustc -> riscv64ima-polygon-ziskos-elf, ziskemu

Release:

n:10000 [82, 229, 228, 9, 207, 11, 252, 118, 235, 27, 13, 44, 75, 164, 54, 106, 253, 126, 193, 14, 54, 32, 188, 119, 81, 120, 47, 45, 222, 206, 161, 159]
Cost definitions:
    AREA_PER_SEC: 1000000 steps
    COST_MEMA_R1: 0.00002 sec
    COST_MEMA_R2: 0.00004 sec
    COST_MEMA_W1: 0.00004 sec
    COST_MEMA_W2: 0.00008 sec
    COST_USUAL: 0.000008 sec
    COST_STEP: 0.00005 sec

Total Cost: 6392.55 sec
    Main Cost: 2122.73 sec 42454507 steps
    Mem Cost: 1107.83 sec 110782774 steps
    Mem Align: 21.22 sec 1061034 steps
    Opcodes: 3125.94 sec 1432 steps (40600544 ops)
    Usual: 14.83 sec 1853964 steps
    Memory: 67849727 a reads + 680496 na1 reads + 0 na2 reads + 42062282 a writes + 190269 na1 writes + 0 na2 writes = 68530223 reads + 42252551 writes = 110782774 r/w
    Registy: 65366939 reads + 40389901 writes = 105756840 r/w

Opcodes:
    flag: 0.00 sec (0 steps/op) (40583 ops)
    copyb: 0.00 sec (0 steps/op) (5266028 ops)
    add: 553.92 sec (77 steps/op) (7193707 ops)
    sub: 0.00 sec (77 steps/op) (11 ops)
    ltu: 6.95 sec (77 steps/op) (90237 ops)
    eq: 1.54 sec (77 steps/op) (19953 ops)
    sll: 620.18 sec (109 steps/op) (5689699 ops)
    srl: 8.73 sec (109 steps/op) (80061 ops)
    add_w: 0.00 sec (77 steps/op) (55 ops)
    sub_w: 0.77 sec (77 steps/op) (10005 ops)
    srl_w: 718.31 sec (109 steps/op) (6589961 ops)
    and: 168.70 sec (77 steps/op) (2190887 ops)
    or: 471.96 sec (77 steps/op) (6129357 ops)
    xor: 531.30 sec (77 steps/op) (6899979 ops)
    signextend_b: 17.44 sec (109 steps/op) (160000 ops)
    signextend_w: 26.16 sec (109 steps/op) (240000 ops)
    mul: 0.00 sec (97 steps/op) (20 ops)
    muluh: 0.00 sec (97 steps/op) (1 ops)

process_rom() steps=42454508 duration=0.3956 tp=107.3291 Msteps/s freq=2209.0000 20.5815 clocks/step

Debug (failed):

Cost definitions:
    AREA_PER_SEC: 1000000 steps
    COST_MEMA_R1: 0.00002 sec
    COST_MEMA_R2: 0.00004 sec
    COST_MEMA_W1: 0.00004 sec
    COST_MEMA_W2: 0.00008 sec
    COST_USUAL: 0.000008 sec
    COST_STEP: 0.00005 sec

Total Cost: 6193.56 sec
    Main Cost: 0.00 sec 0 steps
    Mem Cost: 2492.99 sec 249299312 steps
    Mem Align: 351.37 sec 17568549 steps
    Opcodes: 3279.52 sec 1695 steps (91290009 ops)
    Usual: 69.68 sec 8709991 steps
    Memory: 153039236 a reads + 5137271 na1 reads + 0 na2 reads + 84907166 a writes + 6215639 na1 writes + 0 na2 writes = 158176507 reads + 91122805 writes = 249299312 r/w
    Registy: 135842625 reads + 61924758 writes = 197767383 r/w

Opcodes:
    flag: 0.00 sec (0 steps/op) (5436123 ops)
    copyb: 0.00 sec (0 steps/op) (47777178 ops)
    add: 953.99 sec (77 steps/op) (12389493 ops)
    sub: 4.54 sec (77 steps/op) (59012 ops)
    ltu: 110.80 sec (77 steps/op) (1439009 ops)
    lt: 2.20 sec (77 steps/op) (28608 ops)
    eq: 16.39 sec (77 steps/op) (212832 ops)
    sll: 281.81 sec (109 steps/op) (2585452 ops)
    srl: 158.03 sec (109 steps/op) (1449818 ops)
    add_w: 140.57 sec (77 steps/op) (1825616 ops)
    sub_w: 7.99 sec (77 steps/op) (103724 ops)
    sll_w: 115.75 sec (109 steps/op) (1061885 ops)
    srl_w: 104.07 sec (109 steps/op) (954784 ops)
    and: 480.65 sec (77 steps/op) (6242245 ops)
    or: 286.44 sec (77 steps/op) (3720043 ops)
    xor: 100.09 sec (77 steps/op) (1299881 ops)
    signextend_b: 6.24 sec (109 steps/op) (57220 ops)
    signextend_w: 486.26 sec (109 steps/op) (4461098 ops)
    mul: 10.93 sec (97 steps/op) (112665 ops)
    divu: 12.76 sec (174 steps/op) (73323 ops)

Error during emulation: EmulationNoCompleted

Crashes under QEMU:

fish: Job SIGSEGV, 'Address boundary error' terminated by signal  ()

rustc -> riscv64gc-unknown-linux-gnu

Release:

n:10000 [82, 229, 228, 9, 207, 11, 252, 118, 235, 27, 13, 44, 75, 164, 54, 106, 253, 126, 193, 14, 54, 32, 188, 119, 81, 120, 47, 45, 222, 206, 161, 159]
cpu 0 insns: 42868656
total insns: 42868656

Debug:

n:10000 [82, 229, 228, 9, 207, 11, 252, 118, 235, 27, 13, 44, 75, 164, 54, 106, 253, 126, 193, 14, 54, 32, 188, 119, 81, 120, 47, 45, 222, 206, 161, 159]
cpu 0 insns: 555737110
total insns: 555737110

rustc_codegen_cranelift -> riscv64gc-unknown-linux-gnu

Release:

n:10000 [82, 229, 228, 9, 207, 11, 252, 118, 235, 27, 13, 44, 75, 164, 54, 106, 253, 126, 193, 14, 54, 32, 188, 119, 81, 120, 47, 45, 222, 206, 161, 159]
cpu 0 insns: 342801902
total insns: 342801902

Debug:

n:10000 [82, 229, 228, 9, 207, 11, 252, 118, 235, 27, 13, 44, 75, 164, 54, 106, 253, 126, 193, 14, 54, 32, 188, 119, 81, 120, 47, 45, 222, 206, 161, 159]
cpu 0 insns: 1065241694
total insns: 1065241694

rustc_codegen_cranelift -> x86

Release:

n:10000 [82, 229, 228, 9, 207, 11, 252, 118, 235, 27, 13, 44, 75, 164, 54, 106, 253, 126, 193, 14, 54, 32, 188, 119, 81, 120, 47, 45, 222, 206, 161, 159]
cpu 0 insns: 380275949
total insns: 380275949

Rustc -> WASM -> CLIF -> x86

qemu-x86_64 \
  -plugin tests/tcg/plugins/libinsn.so -d plugin \
  ./target/release/wasmtime run -O opt-level=0 -C cache=yes ../hello_world_nozisk/target/wasm32-wasip1/release/sha_hasher.wasm
n:10000 [82, 229, 228, 9, 207, 11, 252, 118, 235, 27, 13, 44, 75, 164, 54, 106, 253, 126, 193, 14, 54, 32, 188, 119, 81, 120, 47, 45, 222, 206, 161, 159]
cpu 0 insns: 45184403
total insns: 45693833

Rustc -> WASM -> CLIF -> RISC-V

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

rustc -> x86

Release:

n:10000 [82, 229, 228, 9, 207, 11, 252, 118, 235, 27, 13, 44, 75, 164, 54, 106, 253, 126, 193, 14, 54, 32, 188, 119, 81, 120, 47, 45, 222, 206, 161, 159]
cpu 0 insns: 2998513
total insns: 2998513

Debug:

n:10000 [82, 229, 228, 9, 207, 11, 252, 118, 235, 27, 13, 44, 75, 164, 54, 106, 253, 126, 193, 14, 54, 32, 188, 119, 81, 120, 47, 45, 222, 206, 161, 159]
cpu 0 insns: 342708134
total insns: 342708134

rustc -> aarch64

Release:

n:10000 [82, 229, 228, 9, 207, 11, 252, 118, 235, 27, 13, 44, 75, 164, 54, 106, 253, 126, 193, 14, 54, 32, 188, 119, 81, 120, 47, 45, 222, 206, 161, 159]
cpu 0 insns: 20349879
total insns: 20349879

Debug:

n:10000 [82, 229, 228, 9, 207, 11, 252, 118, 235, 27, 13, 44, 75, 164, 54, 106, 253, 126, 193, 14, 54, 32, 188, 119, 81, 120, 47, 45, 222, 206, 161, 159]
cpu 0 insns: 417514369
total insns: 417514369