# Benchmarking the YUL recompiler
_Please note_: Benchmarks were conducted using preliminary, unoptimized MVPs of PolkaVM and the new YUL recompiler.
## Table of Contents
- [zkEVM](#zkEVM-by-Ethereum-Foundation)
- [Host Setup](#Host-Setup)
- [Benchmark: Fibonacci Iterative](#Benchmark-Fibonacci-iterative)
- [EVM vs PVM](#EVM-vs-PVM)
- [Setup](#Setup)
- [Results: _execute_](#Results-execute)
- [Results: _prepare_](#Results-prepare)
- [Ballpark Cost Calculations](#Ballpark-cost-calculations)
- [Edits](#Edits)
Initial benchmarks of the [YUL recompiler](https://github.com/xermicus/revive), comparing against EVM and zkEVM.
## zkEVM by [Ethereum Foundation](https://pse.dev)
### Host Setup
- AMD 7995WX 96core (192vCPU), 256GB RAM, NVME drive
- Linux armari-System-Version 6.5.0-26-generic #26-Ubuntu SMP PREEMPT_DYNAMIC
### Benchmark: Fibonacci iterative
Ideally we'd measure the [super circuit](https://github.com/privacy-scaling-explorations/zkevm-specs/blob/master/specs/super_circuit.md). However it [still doesn't work](github.com/privacy-scaling-explorations/zkevm-circuits/issues/1439).
Thus I measured the sub proof generation for relevant circuits individually. My understanding is that the super proof combinging all tables would speed this up but I lack expertise in the area to tell by how much.
Proof calculation + verification times in seconds based on: https://github.com/xermicus/zkevm-circuits/tree/pvm-benches
Note that I had to adjust circuit degrees (which makes them more costly to proove) in order to fit any reasonable computation in it (the most expensive `fibonacci(256)` cost 68972 gas according to remix IDE, so around half of what an NFT mint cost).
| case | tx | state | evm | keccak | exp | copy | Bytecode | sum | gas (shanghai) |
|-|-|-|-|-|-|-|-|-|-|
| fibonacci(32) | 454s | 85s | 250s | 640s | 10s | 17s | 16s | 1472s | 8716 |
| fibonacci(128) | 455s | 86s | 250s | 641s | 10s | 18s | 16s | 1476s | 34540 |
| fibonacci(256) | 465s | 92s | 264s | 652s | 18s | 26s | 24s | 1540s | 68972 |
<details>
<Summary>Solidity source and script used to run</Summary>
```solidity
contract Fibonacci {
function fibonacci(uint n) external pure returns (uint b) {
if (n == 0) {
return 0;
}
uint a = 1;
b = 1;
for (uint i = 2; i < n; i++) {
uint c = a + b;
a = b;
b = c;
}
return b;
}
}
```
solc Version: 0.8.25
Script to run zkEVM benchmark:
```bash=
CIRCUITS="bytecode_circuit
copy_circuit
evm_circuit
exp_circuit
keccak_circuit
state_circuit
tx_circuit"
# the super_circuit proof fails to verify
TESTS="fibonacci_call_1"
cd zkevm-circuits/integration-tests
for circuit in $CIRCUITS; do
for test in $TESTS; do
cargo test --release --test circuits sub_real_prover::serial_test_${circuit}_${test} -- --nocapture
done
done
```
</details>
## EVM vs PVM
### Setup
#### Host
- AMD Ryzen 9 3900 12core (24vCPU), 64GB RAM, NVME drive
- Linux 6.8.5-arch1-1 #1 SMP PREEMPT_DYNAMIC
PolkaVM: https://github.com/koute/polkavm/tree/4ca06cc1b7cb435b2b92e81e30c2eb0e988f563e (`v0.9.3`)
EVM: https://github.com/xermicus/evm/tree/separate-compilation
#### Cases
Cases are implemented in Solidity. We compile them using `solc` (0.8.25) to either EVM bytecode directly to run on EVM or to YUL IR for PVM. For PVM, the YUL code is then further translated to LLVM and finally compiled to RISC-V rv32em.
|Case|Description|Complexity|Estimated EVM Gas per second (shanghai) vs. wall clock|
|-|-|-|-|
|Baseline|Empty payable contract function that immediatly returns (only checks function selector)|`O(1)`|97m|
|OddProduct(n)|Naivly calculate the product of first `n` odd numbers (unchecked arithmetics)|`O(n)`|180m|
|TriangleNumber(n)|Naivly calculate first `n` triangular number (unchecked arithmetics)|O(n)|199m|
|FibonacciRecursive(n)|`n`th fibonacci number using recrsively|`O(2^n`)|488m|
|FibonacciIterative(n)|`n`th fibonacci number using iteratively|`O(n)`|478m|
#### Execution vs. Preparation
We assume caching of the PolkaVM JIT artifacts.
Instances can be re-used for executing the same contract multiple times. The bracketted value represents the time spent for setting up instances.
### Results: _execute_
Sources:
- https://github.com/xermicus/revive/blob/85bd888f484e78f30bdb1f4a3e6b36911da9b96c/crates/benchmarks/BENCHMARKS.md (generated via `make bench-extensive`)
### Baseline
| | `EVM` | `PVMInterpreter` | `PVM` |
|:--------|:------------------------|:-------------------------------|:---------------------------------- |
| **`0`** | `2.51 us` (✅ **1.00x**) | `6.15 us` [`2.29 us`] (❌ *2.45x slower*) | `107.01 us` [`66.27 us`] (❌ *42.63x slower*) |
### OddPorduct
| | `EVM` | `PVMInterpreter` | `PVM` |
|:----------------|:--------------------------|:---------------------------------|:--------------------------------- |
| **`300000`** | `230.25 ms` (✅ **1.00x**) | `135.10 ms` [`2.34 us`] (✅ **1.70x faster**) | `2.76 ms` [`66.85 us`] (🚀 **83.47x faster**) |
| **`1200000`** | `925.90 ms` (✅ **1.00x**) | `517.73 ms` [`2.34 us`] (✅ **1.79x faster**) | `10.62 ms` [`66.85 us`] (🚀 **87.20x faster**) |
| **`12000000`** | `8.88 s` (✅ **1.00x**) | `5.58 s` [`2.34 us`] (✅ **1.59x faster**) | `99.61 ms` [`66.85 us`] (🚀 **89.11x faster**) |
| **`180000000`** | `132.04 s` (✅ **1.00x**) | `77.34 s` [`2.34 us`] (✅ **1.71x faster**) | `1.52 s` [`66.85 us`] (🚀 **87.07x faster**) |
| **`720000000`** | `552.75 s` (✅ **1.00x**) | `307.87 s` [`2.34 us`] (✅ **1.80x faster**) | `6.19 s` [`66.85 us`] (🚀 **89.35x faster**) |
### TriangleNumber
| | `EVM` | `PVMInterpreter` | `PVM` |
|:----------------|:--------------------------|:---------------------------------|:--------------------------------- |
| **`360000`** | `175.41 ms` (✅ **1.00x**) | `136.21 ms` [`2.51 us`] (✅ **1.29x faster**) | `2.62 ms` [`68.55 us`] (🚀 **66.95x faster**) |
| **`1440000`** | `696.75 ms` (✅ **1.00x**) | `523.80 ms` [`2.51 us`] (✅ **1.33x faster**) | `10.13 ms` [`68.55 us`] (🚀 **68.76x faster**) |
| **`14400000`** | `7.29 s` (✅ **1.00x**) | `5.41 s` [`2.51 us`] (✅ **1.35x faster**) | `95.18 ms` [`68.55 us`] (🚀 **76.63x faster**) |
| **`216000000`** | `109.97 s` (✅ **1.00x**) | `78.94 s` [`2.51 us`] (✅ **1.39x faster**) | `1.49 s` [`68.55 us`] (🚀 **73.78x faster**) |
| **`864000000`** | `418.99 s` (✅ **1.00x**) | `316.43 s` [`2.51 us`] (✅ **1.32x faster**) | `5.96 s` [`68.55 us`] (🚀 **70.30x faster**) |
### FibonacciRecursive
| | `EVM` | `PVMInterpreter` | `PVM` |
|:---------|:--------------------------|:---------------------------------|:---------------------------------- |
| **`24`** | `79.80 ms` (✅ **1.00x**) | `183.13 ms` [`2.46 us`] (❌ *2.29x slower*) | `2.84 ms` [`69.23 us`] (🚀 **28.12x faster**) |
| **`27`** | `324.23 ms` (✅ **1.00x**) | `778.30 ms` [`2.46 us`] (❌ *2.40x slower*) | `10.84 ms` [`69.23 us`] (🚀 **29.91x faster**) |
| **`31`** | `2.25 s` (✅ **1.00x**) | `5.30 s` [`2.46 us`] (❌ *2.36x slower*) | `75.93 ms` [`69.23 us`] (🚀 **29.58x faster**) |
| **`36`** | `24.50 s` (✅ **1.00x**) | `55.07 s` [`2.46 us`] (❌ *2.25x slower*) | `815.96 ms` [`69.23 us`] (🚀 **30.03x faster**) |
| **`39`** | `103.70 s` (✅ **1.00x**) | `241.57 s` [`2.46 us`] (❌ *2.33x slower*) | `3.48 s` [`69.23 us`] (🚀 **29.80x faster**) |
### FibonacciIterative
| | `EVM` | `PVMInterpreter` | `PVM` |
|:----------------|:--------------------------|:---------------------------------|:---------------------------------- |
| **`256`** | `82.12 us` (✅ **1.00x**) | `322.90 us` [`2.43 us`] (❌ *3.93x slower*) | `125.41 us` [`67.99 us`] (❌ *1.53x slower*) |
| **`162500`** | `54.14 ms` (✅ **1.00x**) | `192.86 ms` [`2.43 us`] (❌ *3.56x slower*) | `2.62 ms` [`67.99 us`] (🚀 **20.69x faster**) |
| **`650000`** | `214.72 ms` (✅ **1.00x**) | `774.01 ms` [`2.43 us`] (❌ *3.60x slower*) | `10.51 ms` [`67.99 us`] (🚀 **20.43x faster**) |
| **`6500000`** | `2.09 s` (✅ **1.00x**) | `7.74 s` [`2.43 us`] (❌ *3.70x slower*) | `101.36 ms` [`67.99 us`] (🚀 **20.62x faster**) |
| **`100000000`** | `31.84 s` (✅ **1.00x**) | `124.29 s` [`2.43 us`] (❌ *3.90x slower*) | `1.51 s` [`67.99 us`] (🚀 **21.07x faster**) |
| **`400000000`** | `127.71 s` (✅ **1.00x**) | `473.57 s` [`2.43 us`] (❌ *3.71x slower*) | `5.97 s` [`67.99 us`] (🚀 **21.37x faster**) |
## Ballpark cost calculations
Inspired by the [risc0 zeth](https://www.risczero.com/blog/zeth-release) blogpost.
- [Assuming AWS region Frankfurt](https://aws.amazon.com/ec2/pricing/reserved-instances/pricing/ ), instances reserved 1 year, no upfront payment.
- Assuming ETH gas per block: `15000000`
- Assuming best observed of 68972 gas per `fibonacci(256)`: `15000000gas / 68972gas` = `217` tx / block
### zkEVM
- Host: AWS [m5ad.24xlarge](https://aws.amazon.com/ec2/instance-types/m5/) instance with 96 AMD vCPU cores, 384 GB RAM and 4 x 900 NVMe SSD
- Time for a single `fibonacci(256)` tx =
- Time to fill a block with 217 `fibonacci(256)` tx = 1540s * 217 = 334180s
- Monthly instance costs: $2,759.40
- Cost to fill a block: 334180s / (30 * 24 * 3600) * $2759.40 = $355.76245833333337
- On-chain verification will demand further costs which are not considered here.
### PVM (compiler backend)
- Host: AWS [m5ad.4xlarge](https://aws.amazon.com/ec2/instance-types/m5/) instance with 16 AMD vCPU cores, 64gb RAM and 2 x 300 NVMe SSD
- Time for a single `fibonacci(256)` tx, assuming caching of compiled PVM code = instantiation + execution = 79.48us + 44.03us = 123.51us
- Time to fill a block with 217 `fibonacci(256)` tx = 123.51us * 217 = 26801.67us ~= 0.03s
- Monthly instance costs: $459.90
- Cost to fill a block: 0.03 / (30 * 24 * 3600) * $459.90 = $0.00000532291666667
### Verdict
- zkEVM block estimated costs: $355.7624587
- PVM block estimated costs: $0.000005322916
- Roughly, zkEVM block / PVM block = 355.7624587 / 0.000005322916 ~= 66836008
- We could have around 66mil PVM validators to get to the same cost per block
- Polkadot currently has 1000 validators
- In order for zkEVM to be more cost efficitive I should have made grave mistakes that shift the numbers into the favor of the zkEVM by more than four orders of magnitudes;
- However the numbers Jan produced for risc0 are not far off from what I got here.
## Edits
**Update** 16.05.2024:
- Updated benchmark results using the latest PolkaVM version; Instantiation now counts to execute.
**Update** 01.05.2024:
- Updated benchmark results using the latest PolkaVM version lazy execution in the interpreter
**Update** 24.04.2024:
- Updated benchmark results using the latest compiler version
**Update** 23.04.2024:
- Rerun execute benchmarks with larger parameters
- Added note