# Benchmarking the YUL recompiler _Please note_: Benchmarks were conducted using preliminary, unoptimized MVPs of PolkaVM and the new YUL recompiler. ## Table of Contents - [zkEVM](#zkEVM-by-Ethereum-Foundation) - [Host Setup](#Host-Setup) - [Benchmark: Fibonacci Iterative](#Benchmark-Fibonacci-iterative) - [EVM vs PVM](#EVM-vs-PVM) - [Setup](#Setup) - [Results: _execute_](#Results-execute) - [Results: _prepare_](#Results-prepare) - [Ballpark Cost Calculations](#Ballpark-cost-calculations) - [Edits](#Edits) Initial benchmarks of the [YUL recompiler](https://github.com/xermicus/revive), comparing against EVM and zkEVM. ## zkEVM by [Ethereum Foundation](https://pse.dev) ### Host Setup - AMD 7995WX 96core (192vCPU), 256GB RAM, NVME drive - Linux armari-System-Version 6.5.0-26-generic #26-Ubuntu SMP PREEMPT_DYNAMIC ### Benchmark: Fibonacci iterative Ideally we'd measure the [super circuit](https://github.com/privacy-scaling-explorations/zkevm-specs/blob/master/specs/super_circuit.md). However it [still doesn't work](github.com/privacy-scaling-explorations/zkevm-circuits/issues/1439). Thus I measured the sub proof generation for relevant circuits individually. My understanding is that the super proof combinging all tables would speed this up but I lack expertise in the area to tell by how much. Proof calculation + verification times in seconds based on: https://github.com/xermicus/zkevm-circuits/tree/pvm-benches Note that I had to adjust circuit degrees (which makes them more costly to proove) in order to fit any reasonable computation in it (the most expensive `fibonacci(256)` cost 68972 gas according to remix IDE, so around half of what an NFT mint cost). | case | tx | state | evm | keccak | exp | copy | Bytecode | sum | gas (shanghai) | |-|-|-|-|-|-|-|-|-|-| | fibonacci(32) | 454s | 85s | 250s | 640s | 10s | 17s | 16s | 1472s | 8716 | | fibonacci(128) | 455s | 86s | 250s | 641s | 10s | 18s | 16s | 1476s | 34540 | | fibonacci(256) | 465s | 92s | 264s | 652s | 18s | 26s | 24s | 1540s | 68972 | <details> <Summary>Solidity source and script used to run</Summary> ```solidity contract Fibonacci { function fibonacci(uint n) external pure returns (uint b) { if (n == 0) { return 0; } uint a = 1; b = 1; for (uint i = 2; i < n; i++) { uint c = a + b; a = b; b = c; } return b; } } ``` solc Version: 0.8.25 Script to run zkEVM benchmark: ```bash= CIRCUITS="bytecode_circuit copy_circuit evm_circuit exp_circuit keccak_circuit state_circuit tx_circuit" # the super_circuit proof fails to verify TESTS="fibonacci_call_1" cd zkevm-circuits/integration-tests for circuit in $CIRCUITS; do for test in $TESTS; do cargo test --release --test circuits sub_real_prover::serial_test_${circuit}_${test} -- --nocapture done done ``` </details> ## EVM vs PVM ### Setup #### Host - AMD Ryzen 9 3900 12core (24vCPU), 64GB RAM, NVME drive - Linux 6.8.5-arch1-1 #1 SMP PREEMPT_DYNAMIC PolkaVM: https://github.com/koute/polkavm/tree/4ca06cc1b7cb435b2b92e81e30c2eb0e988f563e (`v0.9.3`) EVM: https://github.com/xermicus/evm/tree/separate-compilation #### Cases Cases are implemented in Solidity. We compile them using `solc` (0.8.25) to either EVM bytecode directly to run on EVM or to YUL IR for PVM. For PVM, the YUL code is then further translated to LLVM and finally compiled to RISC-V rv32em. |Case|Description|Complexity|Estimated EVM Gas per second (shanghai) vs. wall clock| |-|-|-|-| |Baseline|Empty payable contract function that immediatly returns (only checks function selector)|`O(1)`|97m| |OddProduct(n)|Naivly calculate the product of first `n` odd numbers (unchecked arithmetics)|`O(n)`|180m| |TriangleNumber(n)|Naivly calculate first `n` triangular number (unchecked arithmetics)|O(n)|199m| |FibonacciRecursive(n)|`n`th fibonacci number using recrsively|`O(2^n`)|488m| |FibonacciIterative(n)|`n`th fibonacci number using iteratively|`O(n)`|478m| #### Execution vs. Preparation We assume caching of the PolkaVM JIT artifacts. Instances can be re-used for executing the same contract multiple times. The bracketted value represents the time spent for setting up instances. ### Results: _execute_ Sources: - https://github.com/xermicus/revive/blob/85bd888f484e78f30bdb1f4a3e6b36911da9b96c/crates/benchmarks/BENCHMARKS.md (generated via `make bench-extensive`) ### Baseline | | `EVM` | `PVMInterpreter` | `PVM` | |:--------|:------------------------|:-------------------------------|:---------------------------------- | | **`0`** | `2.51 us` (✅ **1.00x**) | `6.15 us` [`2.29 us`] (❌ *2.45x slower*) | `107.01 us` [`66.27 us`] (❌ *42.63x slower*) | ### OddPorduct | | `EVM` | `PVMInterpreter` | `PVM` | |:----------------|:--------------------------|:---------------------------------|:--------------------------------- | | **`300000`** | `230.25 ms` (✅ **1.00x**) | `135.10 ms` [`2.34 us`] (✅ **1.70x faster**) | `2.76 ms` [`66.85 us`] (🚀 **83.47x faster**) | | **`1200000`** | `925.90 ms` (✅ **1.00x**) | `517.73 ms` [`2.34 us`] (✅ **1.79x faster**) | `10.62 ms` [`66.85 us`] (🚀 **87.20x faster**) | | **`12000000`** | `8.88 s` (✅ **1.00x**) | `5.58 s` [`2.34 us`] (✅ **1.59x faster**) | `99.61 ms` [`66.85 us`] (🚀 **89.11x faster**) | | **`180000000`** | `132.04 s` (✅ **1.00x**) | `77.34 s` [`2.34 us`] (✅ **1.71x faster**) | `1.52 s` [`66.85 us`] (🚀 **87.07x faster**) | | **`720000000`** | `552.75 s` (✅ **1.00x**) | `307.87 s` [`2.34 us`] (✅ **1.80x faster**) | `6.19 s` [`66.85 us`] (🚀 **89.35x faster**) | ### TriangleNumber | | `EVM` | `PVMInterpreter` | `PVM` | |:----------------|:--------------------------|:---------------------------------|:--------------------------------- | | **`360000`** | `175.41 ms` (✅ **1.00x**) | `136.21 ms` [`2.51 us`] (✅ **1.29x faster**) | `2.62 ms` [`68.55 us`] (🚀 **66.95x faster**) | | **`1440000`** | `696.75 ms` (✅ **1.00x**) | `523.80 ms` [`2.51 us`] (✅ **1.33x faster**) | `10.13 ms` [`68.55 us`] (🚀 **68.76x faster**) | | **`14400000`** | `7.29 s` (✅ **1.00x**) | `5.41 s` [`2.51 us`] (✅ **1.35x faster**) | `95.18 ms` [`68.55 us`] (🚀 **76.63x faster**) | | **`216000000`** | `109.97 s` (✅ **1.00x**) | `78.94 s` [`2.51 us`] (✅ **1.39x faster**) | `1.49 s` [`68.55 us`] (🚀 **73.78x faster**) | | **`864000000`** | `418.99 s` (✅ **1.00x**) | `316.43 s` [`2.51 us`] (✅ **1.32x faster**) | `5.96 s` [`68.55 us`] (🚀 **70.30x faster**) | ### FibonacciRecursive | | `EVM` | `PVMInterpreter` | `PVM` | |:---------|:--------------------------|:---------------------------------|:---------------------------------- | | **`24`** | `79.80 ms` (✅ **1.00x**) | `183.13 ms` [`2.46 us`] (❌ *2.29x slower*) | `2.84 ms` [`69.23 us`] (🚀 **28.12x faster**) | | **`27`** | `324.23 ms` (✅ **1.00x**) | `778.30 ms` [`2.46 us`] (❌ *2.40x slower*) | `10.84 ms` [`69.23 us`] (🚀 **29.91x faster**) | | **`31`** | `2.25 s` (✅ **1.00x**) | `5.30 s` [`2.46 us`] (❌ *2.36x slower*) | `75.93 ms` [`69.23 us`] (🚀 **29.58x faster**) | | **`36`** | `24.50 s` (✅ **1.00x**) | `55.07 s` [`2.46 us`] (❌ *2.25x slower*) | `815.96 ms` [`69.23 us`] (🚀 **30.03x faster**) | | **`39`** | `103.70 s` (✅ **1.00x**) | `241.57 s` [`2.46 us`] (❌ *2.33x slower*) | `3.48 s` [`69.23 us`] (🚀 **29.80x faster**) | ### FibonacciIterative | | `EVM` | `PVMInterpreter` | `PVM` | |:----------------|:--------------------------|:---------------------------------|:---------------------------------- | | **`256`** | `82.12 us` (✅ **1.00x**) | `322.90 us` [`2.43 us`] (❌ *3.93x slower*) | `125.41 us` [`67.99 us`] (❌ *1.53x slower*) | | **`162500`** | `54.14 ms` (✅ **1.00x**) | `192.86 ms` [`2.43 us`] (❌ *3.56x slower*) | `2.62 ms` [`67.99 us`] (🚀 **20.69x faster**) | | **`650000`** | `214.72 ms` (✅ **1.00x**) | `774.01 ms` [`2.43 us`] (❌ *3.60x slower*) | `10.51 ms` [`67.99 us`] (🚀 **20.43x faster**) | | **`6500000`** | `2.09 s` (✅ **1.00x**) | `7.74 s` [`2.43 us`] (❌ *3.70x slower*) | `101.36 ms` [`67.99 us`] (🚀 **20.62x faster**) | | **`100000000`** | `31.84 s` (✅ **1.00x**) | `124.29 s` [`2.43 us`] (❌ *3.90x slower*) | `1.51 s` [`67.99 us`] (🚀 **21.07x faster**) | | **`400000000`** | `127.71 s` (✅ **1.00x**) | `473.57 s` [`2.43 us`] (❌ *3.71x slower*) | `5.97 s` [`67.99 us`] (🚀 **21.37x faster**) | ## Ballpark cost calculations Inspired by the [risc0 zeth](https://www.risczero.com/blog/zeth-release) blogpost. - [Assuming AWS region Frankfurt](https://aws.amazon.com/ec2/pricing/reserved-instances/pricing/ ), instances reserved 1 year, no upfront payment. - Assuming ETH gas per block: `15000000` - Assuming best observed of 68972 gas per `fibonacci(256)`: `15000000gas / 68972gas` = `217` tx / block ### zkEVM - Host: AWS [m5ad.24xlarge](https://aws.amazon.com/ec2/instance-types/m5/) instance with 96 AMD vCPU cores, 384 GB RAM and 4 x 900 NVMe SSD - Time for a single `fibonacci(256)` tx = - Time to fill a block with 217 `fibonacci(256)` tx = 1540s * 217 = 334180s - Monthly instance costs: $2,759.40 - Cost to fill a block: 334180s / (30 * 24 * 3600) * $2759.40 = $355.76245833333337 - On-chain verification will demand further costs which are not considered here. ### PVM (compiler backend) - Host: AWS [m5ad.4xlarge](https://aws.amazon.com/ec2/instance-types/m5/) instance with 16 AMD vCPU cores, 64gb RAM and 2 x 300 NVMe SSD - Time for a single `fibonacci(256)` tx, assuming caching of compiled PVM code = instantiation + execution = 79.48us + 44.03us = 123.51us - Time to fill a block with 217 `fibonacci(256)` tx = 123.51us * 217 = 26801.67us ~= 0.03s - Monthly instance costs: $459.90 - Cost to fill a block: 0.03 / (30 * 24 * 3600) * $459.90 = $0.00000532291666667 ### Verdict - zkEVM block estimated costs: $355.7624587 - PVM block estimated costs: $0.000005322916 - Roughly, zkEVM block / PVM block = 355.7624587 / 0.000005322916 ~= 66836008 - We could have around 66mil PVM validators to get to the same cost per block - Polkadot currently has 1000 validators - In order for zkEVM to be more cost efficitive I should have made grave mistakes that shift the numbers into the favor of the zkEVM by more than four orders of magnitudes; - However the numbers Jan produced for risc0 are not far off from what I got here. ## Edits **Update** 16.05.2024: - Updated benchmark results using the latest PolkaVM version; Instantiation now counts to execute. **Update** 01.05.2024: - Updated benchmark results using the latest PolkaVM version lazy execution in the interpreter **Update** 24.04.2024: - Updated benchmark results using the latest compiler version **Update** 23.04.2024: - Rerun execute benchmarks with larger parameters - Added note