Try โ€‚โ€‰HackMD

Benchmarking the YUL recompiler

Please note: Benchmarks were conducted using preliminary, unoptimized MVPs of PolkaVM and the new YUL recompiler.

Table of Contents

Initial benchmarks of the YUL recompiler, comparing against EVM and zkEVM.

zkEVM by Ethereum Foundation

Host Setup

  • AMD 7995WX 96core (192vCPU), 256GB RAM, NVME drive
  • Linux armari-System-Version 6.5.0-26-generic #26-Ubuntu SMP PREEMPT_DYNAMIC

Benchmark: Fibonacci iterative

Ideally we'd measure the super circuit. However it still doesn't work.

Thus I measured the sub proof generation for relevant circuits individually. My understanding is that the super proof combinging all tables would speed this up but I lack expertise in the area to tell by how much.

Proof calculation + verification times in seconds based on: https://github.com/xermicus/zkevm-circuits/tree/pvm-benches

Note that I had to adjust circuit degrees (which makes them more costly to proove) in order to fit any reasonable computation in it (the most expensive fibonacci(256) cost 68972 gas according to remix IDE, so around half of what an NFT mint cost).

case tx state evm keccak exp copy Bytecode sum gas (shanghai)
fibonacci(32) 454s 85s 250s 640s 10s 17s 16s 1472s 8716
fibonacci(128) 455s 86s 250s 641s 10s 18s 16s 1476s 34540
fibonacci(256) 465s 92s 264s 652s 18s 26s 24s 1540s 68972
Solidity source and script used to run ```solidity contract Fibonacci { function fibonacci(uint n) external pure returns (uint b) { if (n == 0) { return 0; } uint a = 1; b = 1; for (uint i = 2; i < n; i++) { uint c = a + b; a = b; b = c; } return b; } } ```

solc Version: 0.8.25

Script to run zkEVM benchmark:

CIRCUITS="bytecode_circuit copy_circuit evm_circuit exp_circuit keccak_circuit state_circuit tx_circuit" # the super_circuit proof fails to verify TESTS="fibonacci_call_1" cd zkevm-circuits/integration-tests for circuit in $CIRCUITS; do for test in $TESTS; do cargo test --release --test circuits sub_real_prover::serial_test_${circuit}_${test} -- --nocapture done done

EVM vs PVM

Setup

Host

  • AMD Ryzen 9 3900 12core (24vCPU), 64GB RAM, NVME drive
  • Linux 6.8.5-arch1-1 #1 SMP PREEMPT_DYNAMIC

PolkaVM: https://github.com/koute/polkavm/tree/4ca06cc1b7cb435b2b92e81e30c2eb0e988f563e (v0.9.3)
EVM: https://github.com/xermicus/evm/tree/separate-compilation

Cases

Cases are implemented in Solidity. We compile them using solc (0.8.25) to either EVM bytecode directly to run on EVM or to YUL IR for PVM. For PVM, the YUL code is then further translated to LLVM and finally compiled to RISC-V rv32em.

Case Description Complexity Estimated EVM Gas per second (shanghai) vs. wall clock
Baseline Empty payable contract function that immediatly returns (only checks function selector) O(1) 97m
OddProduct(n) Naivly calculate the product of first n odd numbers (unchecked arithmetics) O(n) 180m
TriangleNumber(n) Naivly calculate first n triangular number (unchecked arithmetics) O(n) 199m
FibonacciRecursive(n) nth fibonacci number using recrsively O(2^n) 488m
FibonacciIterative(n) nth fibonacci number using iteratively O(n) 478m

Execution vs. Preparation

We assume caching of the PolkaVM JIT artifacts.

Instances can be re-used for executing the same contract multiple times. The bracketted value represents the time spent for setting up instances.

Results: execute

Sources:

Baseline

EVM PVMInterpreter PVM
0 2.51 us (โœ… 1.00x) 6.15 us [2.29 us] (โŒ 2.45x slower) 107.01 us [66.27 us] (โŒ 42.63x slower)

OddPorduct

EVM PVMInterpreter PVM
300000 230.25 ms (โœ… 1.00x) 135.10 ms [2.34 us] (โœ… 1.70x faster) 2.76 ms [66.85 us] (๐Ÿš€ 83.47x faster)
1200000 925.90 ms (โœ… 1.00x) 517.73 ms [2.34 us] (โœ… 1.79x faster) 10.62 ms [66.85 us] (๐Ÿš€ 87.20x faster)
12000000 8.88 s (โœ… 1.00x) 5.58 s [2.34 us] (โœ… 1.59x faster) 99.61 ms [66.85 us] (๐Ÿš€ 89.11x faster)
180000000 132.04 s (โœ… 1.00x) 77.34 s [2.34 us] (โœ… 1.71x faster) 1.52 s [66.85 us] (๐Ÿš€ 87.07x faster)
720000000 552.75 s (โœ… 1.00x) 307.87 s [2.34 us] (โœ… 1.80x faster) 6.19 s [66.85 us] (๐Ÿš€ 89.35x faster)

TriangleNumber

EVM PVMInterpreter PVM
360000 175.41 ms (โœ… 1.00x) 136.21 ms [2.51 us] (โœ… 1.29x faster) 2.62 ms [68.55 us] (๐Ÿš€ 66.95x faster)
1440000 696.75 ms (โœ… 1.00x) 523.80 ms [2.51 us] (โœ… 1.33x faster) 10.13 ms [68.55 us] (๐Ÿš€ 68.76x faster)
14400000 7.29 s (โœ… 1.00x) 5.41 s [2.51 us] (โœ… 1.35x faster) 95.18 ms [68.55 us] (๐Ÿš€ 76.63x faster)
216000000 109.97 s (โœ… 1.00x) 78.94 s [2.51 us] (โœ… 1.39x faster) 1.49 s [68.55 us] (๐Ÿš€ 73.78x faster)
864000000 418.99 s (โœ… 1.00x) 316.43 s [2.51 us] (โœ… 1.32x faster) 5.96 s [68.55 us] (๐Ÿš€ 70.30x faster)

FibonacciRecursive

EVM PVMInterpreter PVM
24 79.80 ms (โœ… 1.00x) 183.13 ms [2.46 us] (โŒ 2.29x slower) 2.84 ms [69.23 us] (๐Ÿš€ 28.12x faster)
27 324.23 ms (โœ… 1.00x) 778.30 ms [2.46 us] (โŒ 2.40x slower) 10.84 ms [69.23 us] (๐Ÿš€ 29.91x faster)
31 2.25 s (โœ… 1.00x) 5.30 s [2.46 us] (โŒ 2.36x slower) 75.93 ms [69.23 us] (๐Ÿš€ 29.58x faster)
36 24.50 s (โœ… 1.00x) 55.07 s [2.46 us] (โŒ 2.25x slower) 815.96 ms [69.23 us] (๐Ÿš€ 30.03x faster)
39 103.70 s (โœ… 1.00x) 241.57 s [2.46 us] (โŒ 2.33x slower) 3.48 s [69.23 us] (๐Ÿš€ 29.80x faster)

FibonacciIterative

EVM PVMInterpreter PVM
256 82.12 us (โœ… 1.00x) 322.90 us [2.43 us] (โŒ 3.93x slower) 125.41 us [67.99 us] (โŒ 1.53x slower)
162500 54.14 ms (โœ… 1.00x) 192.86 ms [2.43 us] (โŒ 3.56x slower) 2.62 ms [67.99 us] (๐Ÿš€ 20.69x faster)
650000 214.72 ms (โœ… 1.00x) 774.01 ms [2.43 us] (โŒ 3.60x slower) 10.51 ms [67.99 us] (๐Ÿš€ 20.43x faster)
6500000 2.09 s (โœ… 1.00x) 7.74 s [2.43 us] (โŒ 3.70x slower) 101.36 ms [67.99 us] (๐Ÿš€ 20.62x faster)
100000000 31.84 s (โœ… 1.00x) 124.29 s [2.43 us] (โŒ 3.90x slower) 1.51 s [67.99 us] (๐Ÿš€ 21.07x faster)
400000000 127.71 s (โœ… 1.00x) 473.57 s [2.43 us] (โŒ 3.71x slower) 5.97 s [67.99 us] (๐Ÿš€ 21.37x faster)

Ballpark cost calculations

Inspired by the risc0 zeth blogpost.

  • Assuming AWS region Frankfurt, instances reserved 1 year, no upfront payment.
  • Assuming ETH gas per block: 15000000
  • Assuming best observed of 68972 gas per fibonacci(256): 15000000gas / 68972gas = 217 tx / block

zkEVM

  • Host: AWS m5ad.24xlarge instance with 96 AMD vCPU cores, 384 GB RAM and 4 x 900 NVMe SSD
  • Time for a single fibonacci(256) tx =
  • Time to fill a block with 217 fibonacci(256) tx = 1540s * 217 = 334180s
  • Monthly instance costs: $2,759.40
  • Cost to fill a block: 334180s / (30 * 24 * 3600) * $2759.40 = $355.76245833333337
  • On-chain verification will demand further costs which are not considered here.

PVM (compiler backend)

  • Host: AWS m5ad.4xlarge instance with 16 AMD vCPU cores, 64gb RAM and 2 x 300 NVMe SSD
  • Time for a single fibonacci(256) tx, assuming caching of compiled PVM code = instantiation + execution = 79.48us + 44.03us = 123.51us
  • Time to fill a block with 217 fibonacci(256) tx = 123.51us * 217 = 26801.67us ~= 0.03s
  • Monthly instance costs: $459.90
  • Cost to fill a block: 0.03 / (30 * 24 * 3600) * $459.90 = $0.00000532291666667

Verdict

  • zkEVM block estimated costs: $355.7624587
  • PVM block estimated costs: $0.000005322916
  • Roughly, zkEVM block / PVM block = 355.7624587 / 0.000005322916 ~= 66836008
  • We could have around 66mil PVM validators to get to the same cost per block
  • Polkadot currently has 1000 validators
  • In order for zkEVM to be more cost efficitive I should have made grave mistakes that shift the numbers into the favor of the zkEVM by more than four orders of magnitudes;
  • However the numbers Jan produced for risc0 are not far off from what I got here.

Edits

Update 16.05.2024:

  • Updated benchmark results using the latest PolkaVM version; Instantiation now counts to execute.

Update 01.05.2024:

  • Updated benchmark results using the latest PolkaVM version lazy execution in the interpreter

Update 24.04.2024:

  • Updated benchmark results using the latest compiler version

Update 23.04.2024:

  • Rerun execute benchmarks with larger parameters
  • Added note