The Case for an Aggressive Arithmetic Opcode Gas Pricing Model for EIP-7747

# The Case for an Aggressive Arithmetic Opcode Gas Pricing Model for EIP-7747 EIP-7747 proposes a conservative cost model for the new modular arithmetic opcodes it introduces. However, this may be more than 2x more expensive than an implementation which uses handwritten assembly for the arithmetic. Aggressive gas pricing for bit-widths with desirable use-cases can provide substantial cost reductions bringing costs for certain operations closer to precompile/native-levels. The example chosen here is BLS12381 G1/G2 point multiplication operations. The EVM implementations presented here use the standard double-and-add algorithm and omit subgroup checks. They are benchmarked using several client implementation/cost-model presets with varying implementation complexity: * `asm-mulmodx` - x86_64 assembly for multiplication * `asm` - x86_64 assembly for addition/subtraction/multiplication * `fallback` - Go reference implementation (basis for the EIP-7747 cost model and Geth implementation) The cost model for these example presets is altered for 321-384 bit moduli: | preset | `ADDMODX`/`SUBMODX` gas cost | `MULMODX` gas cost | | ---- | ---- | ---- | | `asm` | 1 | 2 | | `asm-mulmodx` | 2 | 2 | | `fallback` | 2 | 4 ## Benchmarks Benchmarks were executed on a Xeon CPU E5-2686 v4 (2.30GHz). The target gas rate is the same as the EIP: 27 ns / gas. Benchmark input is the generator point and curve sub-group order. ### Gas Costs #### G1 Mul ![Alt text](https://github.com/jwasinger/evmmax-bench-plot/blob/1693e443d4a4e0859de34aadb99d4a0823ae3cba/charts/bls12381-gas-g1.png?raw=true) #### G2 Mul ![Alt text](https://github.com/jwasinger/evmmax-bench-plot/blob/1693e443d4a4e0859de34aadb99d4a0823ae3cba/charts/bls12381-gas-g2.png?raw=true) ### Execution Time Note: [gnark-crypto](https://github.com/Consensys/gnark-crypto/tree/master/ecc/bls12-378) is the backing library for the Geth implementation of EIP-2537. Benchmarks of its implementation of double-and-add (with subgroup checks omitted) are provided for reference. It implements base field operations in assembly. #### G1 Mul ![Alt text](https://github.com/jwasinger/evmmax-bench-plot/blob/1693e443d4a4e0859de34aadb99d4a0823ae3cba/charts/bls12381-time-g1.png?raw=true) #### G2 Mul ![Alt text](https://github.com/jwasinger/evmmax-bench-plot/blob/1693e443d4a4e0859de34aadb99d4a0823ae3cba/charts/bls12381-time-g2.png?raw=true) ### Assembly Opcode and Arithmetic Benchmarks Overall opcode runtime is ~40% less for assembly implementations of the arithmetic at 321-384 bit widths. Contribution of modular addition/subtraction to overhead is negligible compared to rest of interpreter loop. ###### `ADDMODX` ![Alt text](https://github.com/jwasinger/evmmax-bench-plot/blob/5284df958171bc95ee9dcf47c995cffac7a136b0/charts/addmodx-asm.png?raw=true) ###### `SUBMODX` ![Alt text](https://github.com/jwasinger/evmmax-bench-plot/blob/5284df958171bc95ee9dcf47c995cffac7a136b0/charts/submodx-asm.png?raw=true) ###### `MULMODX` ![Alt text](https://github.com/jwasinger/evmmax-bench-plot/blob/5284df958171bc95ee9dcf47c995cffac7a136b0/charts/mulmodx-asm.png?raw=true) ## Takeaways * With the most aggressive model, gas costs for EVM G1/G2 ecmul implementation (sans missing subgroup checks) is 2.66x and 1.5x greater than precompile prices proposed in EIP-2537: * The difference is lower for G2 because ratio of field operations to EVM control flow is higher. This doesn't indicate that the aggressive model is underpriced, just that overall EVM pricing is conservative (in this specific case) compared to ecrecover, considering execution time alone. * The implementations are not exactly the same algorithm so the comparison here is not exactly apples-to-apples: * The EVM implementation of point multiplication uses [complete formulas](https://eprint.iacr.org/2015/1060.pdf) for doubling/addition which trade reduced control flow for additional field operations compared to [double](http://www.hyperelliptic.org/EFD/g1p/auto-shortw-xyzz.html#doubling-dbl-2008-s-1)/[add](https://www.hyperelliptic.org/EFD/g1p/auto-shortw-xyzz.html#addition-add-2008-s) formulas used by gnark. * gnark double-and-add ecmul uses a window optimization with a window size of 2. This specific optimization could be applied to the EVM implementation (but I haven't had time to do it). * For other bls precompiles fp-to-g1/fp2-to-g2 and pairing: These are all almost-entirely composed of field operations (with little control flow), and can be seamlessly translated to EVM implementations without the need to switch formulas for gas efficiency. * Gas/execution overhead of EVM vs native will be lower than g2 mul for pairing and mapping operations due to low EVM control-flow overhead. ## Reproducing the Benchmarks See https://github.com/jwasinger/evmmax-benchmarks for instructions.