owned this note
owned this note
Published
Linked with GitHub
# Rust ISA benchmarks
Codebase: https://github.com/aborg-dev/isa_benchmarks
Index document: https://hackmd.io/@aborg-dev/H1JgUcGCR
This document describes the set of experiments to find the best ISA that we can use as an intermediate representation when compiling Rust code to Zisk.
This involves:
- Selecting a [set of benchmarks](https://github.com/aborg-dev/isa_benchmarks/tree/main/src/bin) and implementing them in Rust
- Compiling the resulting programs to a range of ISAs supported by Rust compiler with right optimization flags
- Running the resulting binaries in QEMU and measuring the execution costs in the number and complexity of instructions
## Results
### Summary for Ethereum block
- rustc -> x86: 250M steps (with vectorized instr 2.5M)
- rustc -> aarch64: 191M steps
- rustc -> riscv64gc: 320M steps
- ziskemu steps match qemu instruction counter: 320M steps
- `rustc_codegen_cranelift` is around 8x slower with default settings
- Code is indeed much less optimized, needs inlining, loop unrolling, etc.
- rustc -> WASM -> x86: 600M steps
- rustc -> WASM -> w2c2 -> C -> x86: 376M steps
### Remaining steps
- Understand why "rustc -> x86" is so fast for SHA 256
- So far seems to be due to more complex instructions which leads to less instructions used overall
- Understand why rust -> WASM -> x86 is 10x-15x slower
- Is inlining to blame for the results? No, it is properly inlined
- Or do we need to example WASM vectorized instructions? Not really, I disabled all of these
- Also could be due to WASM interpreter overhead - the most likely one, need to verify
- Measure WASM -> Cranelift -> RISC-V path
- If we are at the same level, might be a viable path
## Setup
- QEMU version 9.1
- https://github.com/rust-lang/rustc_codegen_cranelift
- Rustup setup:
```
rustup target add riscv64gc-unknown-linux-gnu
rustup target add aarch64-unknown-linux-gnu
```
### Profiling
Profiling commands:
```shell
# For RISC-V
qemu-riscv64 -L /usr/riscv64-linux-gnu -cpu rv64 \
-plugin tests/tcg/plugins/libinsn.so -d plugin \
sha_hasher
# For ziskos
ziskemu -i build/input.bin -xm -e sha_hasher
# For x86
qemu-x86_64 \
-plugin tests/tcg/plugins/libinsn.so -d plugin \
sha_hasher
# For ARM
qemu-aarch64 \
-plugin tests/tcg/plugins/libinsn.so -d plugin \
sha_hasher
```
## Raw data
### rustc -> riscv64ima-polygon-ziskos-elf, ziskemu
Release:
```
n:10000 [82, 229, 228, 9, 207, 11, 252, 118, 235, 27, 13, 44, 75, 164, 54, 106, 253, 126, 193, 14, 54, 32, 188, 119, 81, 120, 47, 45, 222, 206, 161, 159]
Cost definitions:
AREA_PER_SEC: 1000000 steps
COST_MEMA_R1: 0.00002 sec
COST_MEMA_R2: 0.00004 sec
COST_MEMA_W1: 0.00004 sec
COST_MEMA_W2: 0.00008 sec
COST_USUAL: 0.000008 sec
COST_STEP: 0.00005 sec
Total Cost: 6392.55 sec
Main Cost: 2122.73 sec 42454507 steps
Mem Cost: 1107.83 sec 110782774 steps
Mem Align: 21.22 sec 1061034 steps
Opcodes: 3125.94 sec 1432 steps (40600544 ops)
Usual: 14.83 sec 1853964 steps
Memory: 67849727 a reads + 680496 na1 reads + 0 na2 reads + 42062282 a writes + 190269 na1 writes + 0 na2 writes = 68530223 reads + 42252551 writes = 110782774 r/w
Registy: 65366939 reads + 40389901 writes = 105756840 r/w
Opcodes:
flag: 0.00 sec (0 steps/op) (40583 ops)
copyb: 0.00 sec (0 steps/op) (5266028 ops)
add: 553.92 sec (77 steps/op) (7193707 ops)
sub: 0.00 sec (77 steps/op) (11 ops)
ltu: 6.95 sec (77 steps/op) (90237 ops)
eq: 1.54 sec (77 steps/op) (19953 ops)
sll: 620.18 sec (109 steps/op) (5689699 ops)
srl: 8.73 sec (109 steps/op) (80061 ops)
add_w: 0.00 sec (77 steps/op) (55 ops)
sub_w: 0.77 sec (77 steps/op) (10005 ops)
srl_w: 718.31 sec (109 steps/op) (6589961 ops)
and: 168.70 sec (77 steps/op) (2190887 ops)
or: 471.96 sec (77 steps/op) (6129357 ops)
xor: 531.30 sec (77 steps/op) (6899979 ops)
signextend_b: 17.44 sec (109 steps/op) (160000 ops)
signextend_w: 26.16 sec (109 steps/op) (240000 ops)
mul: 0.00 sec (97 steps/op) (20 ops)
muluh: 0.00 sec (97 steps/op) (1 ops)
process_rom() steps=42454508 duration=0.3956 tp=107.3291 Msteps/s freq=2209.0000 20.5815 clocks/step
```
Debug (failed):
```
Cost definitions:
AREA_PER_SEC: 1000000 steps
COST_MEMA_R1: 0.00002 sec
COST_MEMA_R2: 0.00004 sec
COST_MEMA_W1: 0.00004 sec
COST_MEMA_W2: 0.00008 sec
COST_USUAL: 0.000008 sec
COST_STEP: 0.00005 sec
Total Cost: 6193.56 sec
Main Cost: 0.00 sec 0 steps
Mem Cost: 2492.99 sec 249299312 steps
Mem Align: 351.37 sec 17568549 steps
Opcodes: 3279.52 sec 1695 steps (91290009 ops)
Usual: 69.68 sec 8709991 steps
Memory: 153039236 a reads + 5137271 na1 reads + 0 na2 reads + 84907166 a writes + 6215639 na1 writes + 0 na2 writes = 158176507 reads + 91122805 writes = 249299312 r/w
Registy: 135842625 reads + 61924758 writes = 197767383 r/w
Opcodes:
flag: 0.00 sec (0 steps/op) (5436123 ops)
copyb: 0.00 sec (0 steps/op) (47777178 ops)
add: 953.99 sec (77 steps/op) (12389493 ops)
sub: 4.54 sec (77 steps/op) (59012 ops)
ltu: 110.80 sec (77 steps/op) (1439009 ops)
lt: 2.20 sec (77 steps/op) (28608 ops)
eq: 16.39 sec (77 steps/op) (212832 ops)
sll: 281.81 sec (109 steps/op) (2585452 ops)
srl: 158.03 sec (109 steps/op) (1449818 ops)
add_w: 140.57 sec (77 steps/op) (1825616 ops)
sub_w: 7.99 sec (77 steps/op) (103724 ops)
sll_w: 115.75 sec (109 steps/op) (1061885 ops)
srl_w: 104.07 sec (109 steps/op) (954784 ops)
and: 480.65 sec (77 steps/op) (6242245 ops)
or: 286.44 sec (77 steps/op) (3720043 ops)
xor: 100.09 sec (77 steps/op) (1299881 ops)
signextend_b: 6.24 sec (109 steps/op) (57220 ops)
signextend_w: 486.26 sec (109 steps/op) (4461098 ops)
mul: 10.93 sec (97 steps/op) (112665 ops)
divu: 12.76 sec (174 steps/op) (73323 ops)
Error during emulation: EmulationNoCompleted
```
Crashes under QEMU:
```
fish: Job SIGSEGV, 'Address boundary error' terminated by signal ()
```
### rustc -> riscv64gc-unknown-linux-gnu
Release:
```
n:10000 [82, 229, 228, 9, 207, 11, 252, 118, 235, 27, 13, 44, 75, 164, 54, 106, 253, 126, 193, 14, 54, 32, 188, 119, 81, 120, 47, 45, 222, 206, 161, 159]
cpu 0 insns: 42868656
total insns: 42868656
```
Debug:
```
n:10000 [82, 229, 228, 9, 207, 11, 252, 118, 235, 27, 13, 44, 75, 164, 54, 106, 253, 126, 193, 14, 54, 32, 188, 119, 81, 120, 47, 45, 222, 206, 161, 159]
cpu 0 insns: 555737110
total insns: 555737110
```
### rustc_codegen_cranelift -> riscv64gc-unknown-linux-gnu
Release:
```
n:10000 [82, 229, 228, 9, 207, 11, 252, 118, 235, 27, 13, 44, 75, 164, 54, 106, 253, 126, 193, 14, 54, 32, 188, 119, 81, 120, 47, 45, 222, 206, 161, 159]
cpu 0 insns: 342801902
total insns: 342801902
```
Debug:
```
n:10000 [82, 229, 228, 9, 207, 11, 252, 118, 235, 27, 13, 44, 75, 164, 54, 106, 253, 126, 193, 14, 54, 32, 188, 119, 81, 120, 47, 45, 222, 206, 161, 159]
cpu 0 insns: 1065241694
total insns: 1065241694
```
### rustc_codegen_cranelift -> x86
Release:
```
n:10000 [82, 229, 228, 9, 207, 11, 252, 118, 235, 27, 13, 44, 75, 164, 54, 106, 253, 126, 193, 14, 54, 32, 188, 119, 81, 120, 47, 45, 222, 206, 161, 159]
cpu 0 insns: 380275949
total insns: 380275949
```
### Rustc -> WASM -> CLIF -> x86
```
qemu-x86_64 \
-plugin tests/tcg/plugins/libinsn.so -d plugin \
./target/release/wasmtime run -O opt-level=0 -C cache=yes ../hello_world_nozisk/target/wasm32-wasip1/release/sha_hasher.wasm
```
```
n:10000 [82, 229, 228, 9, 207, 11, 252, 118, 235, 27, 13, 44, 75, 164, 54, 106, 253, 126, 193, 14, 54, 32, 188, 119, 81, 120, 47, 45, 222, 206, 161, 159]
cpu 0 insns: 45184403
total insns: 45693833
```
### Rustc -> WASM -> CLIF -> RISC-V
:construction:
### rustc -> x86
Release:
```
n:10000 [82, 229, 228, 9, 207, 11, 252, 118, 235, 27, 13, 44, 75, 164, 54, 106, 253, 126, 193, 14, 54, 32, 188, 119, 81, 120, 47, 45, 222, 206, 161, 159]
cpu 0 insns: 2998513
total insns: 2998513
```
Debug:
```
n:10000 [82, 229, 228, 9, 207, 11, 252, 118, 235, 27, 13, 44, 75, 164, 54, 106, 253, 126, 193, 14, 54, 32, 188, 119, 81, 120, 47, 45, 222, 206, 161, 159]
cpu 0 insns: 342708134
total insns: 342708134
```
### rustc -> aarch64
Release:
```
n:10000 [82, 229, 228, 9, 207, 11, 252, 118, 235, 27, 13, 44, 75, 164, 54, 106, 253, 126, 193, 14, 54, 32, 188, 119, 81, 120, 47, 45, 222, 206, 161, 159]
cpu 0 insns: 20349879
total insns: 20349879
```
Debug:
```
n:10000 [82, 229, 228, 9, 207, 11, 252, 118, 235, 27, 13, 44, 75, 164, 54, 106, 253, 126, 193, 14, 54, 32, 188, 119, 81, 120, 47, 45, 222, 206, 161, 159]
cpu 0 insns: 417514369
total insns: 417514369
```