Benchmarking the YUL recompiler

Please note: Benchmarks were conducted using preliminary, unoptimized MVPs of PolkaVM and the new YUL recompiler.

zkEVM
- Host Setup
- Benchmark: Fibonacci Iterative
EVM vs PVM
Ballpark Cost Calculations
Edits

Initial benchmarks of the YUL recompiler, comparing against EVM and zkEVM.

zkEVM by Ethereum Foundation

Host Setup

AMD 7995WX 96core (192vCPU), 256GB RAM, NVME drive
Linux armari-System-Version 6.5.0-26-generic #26-Ubuntu SMP PREEMPT_DYNAMIC

Benchmark: Fibonacci iterative

Ideally we'd measure the super circuit. However it still doesn't work.

Thus I measured the sub proof generation for relevant circuits individually. My understanding is that the super proof combinging all tables would speed this up but I lack expertise in the area to tell by how much.

Proof calculation + verification times in seconds based on: https://github.com/xermicus/zkevm-circuits/tree/pvm-benches

Note that I had to adjust circuit degrees (which makes them more costly to proove) in order to fit any reasonable computation in it (the most expensive fibonacci(256) cost 68972 gas according to remix IDE, so around half of what an NFT mint cost).

case	tx	state	evm	keccak	exp	copy	Bytecode	sum	gas (shanghai)
fibonacci(32)	454s	85s	250s	640s	10s	17s	16s	1472s	8716
fibonacci(128)	455s	86s	250s	641s	10s	18s	16s	1476s	34540
fibonacci(256)	465s	92s	264s	652s	18s	26s	24s	1540s	68972

Solidity source and script used to run

```solidity contract Fibonacci { function fibonacci(uint n) external pure returns (uint b) { if (n == 0) { return 0; } uint a = 1; b = 1; for (uint i = 2; i < n; i++) { uint c = a + b; a = b; b = c; } return b; } } ```

solc Version: 0.8.25

Script to run zkEVM benchmark:



















CIRCUITS="bytecode_circuit
copy_circuit
evm_circuit
exp_circuit
keccak_circuit
state_circuit
tx_circuit"
# the super_circuit proof fails to verify

TESTS="fibonacci_call_1"


cd zkevm-circuits/integration-tests

for circuit in $CIRCUITS; do
    for test in $TESTS; do
        cargo test --release --test circuits sub_real_prover::serial_test_${circuit}_${test} -- --nocapture
    done
done

EVM vs PVM

Setup

Host

AMD Ryzen 9 3900 12core (24vCPU), 64GB RAM, NVME drive
Linux 6.8.5-arch1-1 #1 SMP PREEMPT_DYNAMIC

PolkaVM: https://github.com/koute/polkavm/tree/4ca06cc1b7cb435b2b92e81e30c2eb0e988f563e (v0.9.3)
EVM: https://github.com/xermicus/evm/tree/separate-compilation

Cases

Cases are implemented in Solidity. We compile them using solc (0.8.25) to either EVM bytecode directly to run on EVM or to YUL IR for PVM. For PVM, the YUL code is then further translated to LLVM and finally compiled to RISC-V rv32em.

Case	Description	Complexity	Estimated EVM Gas per second (shanghai) vs. wall clock
Baseline	Empty payable contract function that immediatly returns (only checks function selector)	`O(1)`	97m
OddProduct(n)	Naivly calculate the product of first `n` odd numbers (unchecked arithmetics)	`O(n)`	180m
TriangleNumber(n)	Naivly calculate first `n` triangular number (unchecked arithmetics)	O(n)	199m
FibonacciRecursive(n)	`n`th fibonacci number using recrsively	`O(2^n`)	488m
FibonacciIterative(n)	`n`th fibonacci number using iteratively	`O(n)`	478m

Execution vs. Preparation

We assume caching of the PolkaVM JIT artifacts.

Instances can be re-used for executing the same contract multiple times. The bracketted value represents the time spent for setting up instances.

Results: execute

Sources:

https://github.com/xermicus/revive/blob/85bd888f484e78f30bdb1f4a3e6b36911da9b96c/crates/benchmarks/BENCHMARKS.md (generated via make bench-extensive)

Baseline

	`EVM`	`PVMInterpreter`	`PVM`
`0`	`2.51 us` (✅ 1.00x)	`6.15 us` [`2.29 us`] (❌ 2.45x slower)	`107.01 us` [`66.27 us`] (❌ 42.63x slower)

OddPorduct

	`EVM`	`PVMInterpreter`	`PVM`
`300000`	`230.25 ms` (✅ 1.00x)	`135.10 ms` [`2.34 us`] (✅ 1.70x faster)	`2.76 ms` [`66.85 us`] (🚀 83.47x faster)
`1200000`	`925.90 ms` (✅ 1.00x)	`517.73 ms` [`2.34 us`] (✅ 1.79x faster)	`10.62 ms` [`66.85 us`] (🚀 87.20x faster)
`12000000`	`8.88 s` (✅ 1.00x)	`5.58 s` [`2.34 us`] (✅ 1.59x faster)	`99.61 ms` [`66.85 us`] (🚀 89.11x faster)
`180000000`	`132.04 s` (✅ 1.00x)	`77.34 s` [`2.34 us`] (✅ 1.71x faster)	`1.52 s` [`66.85 us`] (🚀 87.07x faster)
`720000000`	`552.75 s` (✅ 1.00x)	`307.87 s` [`2.34 us`] (✅ 1.80x faster)	`6.19 s` [`66.85 us`] (🚀 89.35x faster)

TriangleNumber

	`EVM`	`PVMInterpreter`	`PVM`
`360000`	`175.41 ms` (✅ 1.00x)	`136.21 ms` [`2.51 us`] (✅ 1.29x faster)	`2.62 ms` [`68.55 us`] (🚀 66.95x faster)
`1440000`	`696.75 ms` (✅ 1.00x)	`523.80 ms` [`2.51 us`] (✅ 1.33x faster)	`10.13 ms` [`68.55 us`] (🚀 68.76x faster)
`14400000`	`7.29 s` (✅ 1.00x)	`5.41 s` [`2.51 us`] (✅ 1.35x faster)	`95.18 ms` [`68.55 us`] (🚀 76.63x faster)
`216000000`	`109.97 s` (✅ 1.00x)	`78.94 s` [`2.51 us`] (✅ 1.39x faster)	`1.49 s` [`68.55 us`] (🚀 73.78x faster)
`864000000`	`418.99 s` (✅ 1.00x)	`316.43 s` [`2.51 us`] (✅ 1.32x faster)	`5.96 s` [`68.55 us`] (🚀 70.30x faster)

FibonacciRecursive

	`EVM`	`PVMInterpreter`	`PVM`
`24`	`79.80 ms` (✅ 1.00x)	`183.13 ms` [`2.46 us`] (❌ 2.29x slower)	`2.84 ms` [`69.23 us`] (🚀 28.12x faster)
`27`	`324.23 ms` (✅ 1.00x)	`778.30 ms` [`2.46 us`] (❌ 2.40x slower)	`10.84 ms` [`69.23 us`] (🚀 29.91x faster)
`31`	`2.25 s` (✅ 1.00x)	`5.30 s` [`2.46 us`] (❌ 2.36x slower)	`75.93 ms` [`69.23 us`] (🚀 29.58x faster)
`36`	`24.50 s` (✅ 1.00x)	`55.07 s` [`2.46 us`] (❌ 2.25x slower)	`815.96 ms` [`69.23 us`] (🚀 30.03x faster)
`39`	`103.70 s` (✅ 1.00x)	`241.57 s` [`2.46 us`] (❌ 2.33x slower)	`3.48 s` [`69.23 us`] (🚀 29.80x faster)

FibonacciIterative

	`EVM`	`PVMInterpreter`	`PVM`
`256`	`82.12 us` (✅ 1.00x)	`322.90 us` [`2.43 us`] (❌ 3.93x slower)	`125.41 us` [`67.99 us`] (❌ 1.53x slower)
`162500`	`54.14 ms` (✅ 1.00x)	`192.86 ms` [`2.43 us`] (❌ 3.56x slower)	`2.62 ms` [`67.99 us`] (🚀 20.69x faster)
`650000`	`214.72 ms` (✅ 1.00x)	`774.01 ms` [`2.43 us`] (❌ 3.60x slower)	`10.51 ms` [`67.99 us`] (🚀 20.43x faster)
`6500000`	`2.09 s` (✅ 1.00x)	`7.74 s` [`2.43 us`] (❌ 3.70x slower)	`101.36 ms` [`67.99 us`] (🚀 20.62x faster)
`100000000`	`31.84 s` (✅ 1.00x)	`124.29 s` [`2.43 us`] (❌ 3.90x slower)	`1.51 s` [`67.99 us`] (🚀 21.07x faster)
`400000000`	`127.71 s` (✅ 1.00x)	`473.57 s` [`2.43 us`] (❌ 3.71x slower)	`5.97 s` [`67.99 us`] (🚀 21.37x faster)

Ballpark cost calculations

Inspired by the risc0 zeth blogpost.

Assuming AWS region Frankfurt, instances reserved 1 year, no upfront payment.
Assuming ETH gas per block: 15000000
Assuming best observed of 68972 gas per fibonacci(256): 15000000gas / 68972gas = 217 tx / block

zkEVM

Host: AWS m5ad.24xlarge instance with 96 AMD vCPU cores, 384 GB RAM and 4 x 900 NVMe SSD
Time for a single fibonacci(256) tx =
Time to fill a block with 217 fibonacci(256) tx = 1540s * 217 = 334180s
Monthly instance costs: $2,759.40
Cost to fill a block: 334180s / (30 * 24 * 3600) * $2759.40 = $355.76245833333337
On-chain verification will demand further costs which are not considered here.

PVM (compiler backend)

Host: AWS m5ad.4xlarge instance with 16 AMD vCPU cores, 64gb RAM and 2 x 300 NVMe SSD
Time for a single fibonacci(256) tx, assuming caching of compiled PVM code = instantiation + execution = 79.48us + 44.03us = 123.51us
Time to fill a block with 217 fibonacci(256) tx = 123.51us * 217 = 26801.67us ~= 0.03s
Monthly instance costs: $459.90
Cost to fill a block: 0.03 / (30 * 24 * 3600) * $459.90 = $0.00000532291666667

Verdict

zkEVM block estimated costs: $355.7624587
PVM block estimated costs: $0.000005322916
Roughly, zkEVM block / PVM block = 355.7624587 / 0.000005322916 ~= 66836008
We could have around 66mil PVM validators to get to the same cost per block
Polkadot currently has 1000 validators
In order for zkEVM to be more cost efficitive I should have made grave mistakes that shift the numbers into the favor of the zkEVM by more than four orders of magnitudes;
However the numbers Jan produced for risc0 are not far off from what I got here.

Edits

Update 16.05.2024:

Updated benchmark results using the latest PolkaVM version; Instantiation now counts to execute.

Update 01.05.2024:

Updated benchmark results using the latest PolkaVM version lazy execution in the interpreter

Update 24.04.2024:

Updated benchmark results using the latest compiler version

Update 23.04.2024:

Rerun execute benchmarks with larger parameters
Added note

athei

2024/04/05 08:49:20

111.95 us (❌ 117.10x slower)

Did you use the process caching feature of PolkaVM? Meaning to re-use the same `Engine`? Otherwise spawning the sandbox process could be responsible for much of the static overhead. In reality we will always re-use processes.

Cyrill Leutwiler

2024/04/16 21:19:56

Very good point thanks! I used engine caching and saw massive gains thanks to it. However I was still using InstancePre. Switching to Instance directly helped to bring this it down even more \o/

ordian

2024/05/02 11:34:44

zkEVM block estimated costs: $355.7624587

This seems to high. https://polygon.technology/blog/upgrade-every-evm-chain-to-zk-introducing-the-type-1-prover polygon claims to have proving costs per block to be $0.2-$0.5 and that it is 36X more cost efficient than RISC0, which suggests $18 per block for RISC0.

2024/05/06 13:39:48

Yeah I was surprised as well and I'm of the opinion that this number is quite high. Also we should see more optimizations for ZK proof generation over the next years. However, on the other hand. Expect blog posts about the performance of ZK rollups made by ZK rollup companies to be mostly marketing material (it's risk free for them present numbers around the absolute lower bound, making all assumption optimal, not mentioning some fix costs etc.). Even if some ZK VM claims to generate proofs for many batched transaction for less than $1. In reality the total cost will be much higher than that. Eventually the proof needs to be verified on ETH mainnet, which costs $ in the order of 2 digit numbers according to the risc0 blogpost. Also they assume their cluster, specialized hardware or whatever they are or will be using for proof generation is 100% busy all times, which I can't see holding up since their network is also not operating at 100% capacity all the times. (Edited)

2024/05/06 13:44:02

(Admitted, we too assume 100% capacity, however PVM runs on standard hardware which is much cheapier and easier to repurpose for other workloads if not busy enough)