Current performance of EIP-2537 and EVM384

# Current performance of EIP-2537 and EVM384 In recent progress reports there were a lot of improvements on the execution speed of EVM384 based on the EVMOne implementation of EVM. It was claimed that implementation of the miller loop + final exponentiation for the pairing of two pairs takes around 4.8 ms, that is a great result, but unfortunately it can only be considered as a wonderful academic achievement that unfortunately have nothing to do with the final cost of this operation to end-users. There is a fundamental difference between the precompile and the EVM384 based contract from perspective of heavy cryptography: - in precompile the final cost covers everything: - parsing - input validation - control flow (fast and native) - arithmetics (fast and native) - EVM384 based contract pays separately for: - arithmetics (resonably fast at the moment) - the main focus of EVM384 - control flow (default EVM prices). Let's count that it includes parsing and input validation too One can reasonably assume that EVM384 arithmetic part can be made fast enough so the cost per execution of such opcodes will be set to the minimal possible amount of `1` unit of gas. ## Measured gas costs In this reports the focus is not on the achievent execution time of cryptographic operations expressed in native code or EVM384, but on the proper approach on measurement of the gas cost of such operations. At the end of the day transactions spend gas and not the execution time. To assemble concrete numbers this [instruction](https://gist.github.com/jwasinger/41b890bc5a01b7abd4f75ae70f0641f1#file-instructions-md) was used to build a Geth node with EVM384-v7. An extra code was placed to count number of EVM384 operations (Go 1.15 compiler was used). Here we do not care about the final performance in terms of execution time, but only count a number of operations of different kind. The synthetic benchmark code used `118800` additions/subtractions/multiplication in total (priced as `2` gas at the moment) and spent `602533` gas, so it's `364933` gas spent to execute non-arithmetic (non EVM384) operations (let's continue to call it a "control flow"). The synthetic benchmark itself is expected to simulate a number of operations required for pairing of two pairs of points. Independently few implementations of EIP-2537 were benchmarked on a 2.9 GHz Intel Core i9 machine with turbo-boost disabled: - Rust bindigns to [BLST](https://github.com/supranational/blst) library. Original BLST is a pure C implementation - Pure Rust implementation based on [EIP-1962](https://github.com/matter-labs/eip1962) In addition to the benchmarking the full EIP-2537 procedure for pairing of two pairs (parsing byte array and ABI validation, subgroup checks), an execution time was measured for a pure pairing of two points in the affine coordinates using EIP-1962 library. It is necessary for a reason that synthetic benchmark of EVM384 didn't include factors to account for a subgroup checks for input points, but such check is required for a mere definition of the "pairing" operation, and it is required by EIP-2537. A constant of `30MGas/second` was used to convert an execution time and it's a reference constant used on the machine used for a benchmarking. With all operations being CPU intensive one can quite precisely project the benchmark numbers to CPUs with higher/lower frequencies or cases with active turbo boost. Final execution costs in the units of gas are assembled on a plot below. ## Results ![](https://i.imgur.com/lFYIw6c.png) On the plot above one should pay attention to: - EVM384 total gas cost is projected from using `1` gas (minimal amount) per any EVM384 operation (add/sub/montmul) instead of `2` gas that is currently in the source code - EVM384 control flow uses current gas costs for opcodes and can not be modified assuming that existing EVM operations are properly priced (taking into the account different implementations in Geth/Nethermind/Besu/OpenEthereum) - Conforming to the EIP-2537 spec takes extra 20% of the time on top of the pure pairing execution for a similar input - Current EIP-2537 price has significant margin for adjustments and potentially can be lowered further (prices are after this [PR](https://github.com/ethereum/EIPs/pull/3077) with price adjustments) - Just a control flow in EVM384 based approach (without arithmetic part) takes twice as much gas as the current EIP-2537 spec (upper bound between various implementations + some margin) - Full EVM384 implementation of the BLS12-381 pairing even without subgroup checks is more than `10` times more expensive than BLST implementation (that includes subgroup checks) Even though EVM384 numbers are based on the synthetic benchmark code (that simulates number of operations that one would execute if performed a meaningful BLS12-381 pairing computation for two pairs of points) it's reasonable to assume that this number will not change a lot (and even more likely that control flow contribution will not decrease). Taking into the account an extra work requried on top of pairing execution to conform to EIP-2537 spec (from the perspective that operations are well-defined and safe for an end-user) it's even more reasonable to assume that gas cost of an approach based on EVM384 is going to be `10` times larger than the cost that can be set for EIP-2537 if BLST library is used as a standard. ## Conclusion As a conclusion the only reasonable approach to inclusion of BLS12-381 operations to Ethereum is the following: - Add EIP-2537 as a precompile. It is ready, and it's most likely the highest quality precompile that was ever introduced - Name the BLST a standard library for BLS12-381 operations in Ethereum implementations (BLST library is audited and formally verified) - Further reduce a cost of EIP-2537 after inclusion of BLST into the major clients in a similar way as `sepc256k1` library is used for signature verification BLS12-381 operations are expected to be in a great demand for the various applications (sorted by the expected frequency of invocation per block, in descending order): - Account abstractions (control an account using the code, e.g. one that does verification of the BLS signature, either threshold or an aggregate) - Rollups (BLS signature by multiple operators on a new block) - ZKRollups (BLS signature by multiple operators on a new block + ZKSNARKs verification over BLS12-381 curve) - On-chain randomness (Chainlink, DRAND) - Eth2.0 light client (verification of the BLS signature to check for the canonical chain) - Eth2.0 deposit contract (verification of the user's public key and his/her knowledge of the corresponding private key) Many of such applications are expected to send 1 or more transaction every block (on average), and for account abstractions it's required that pre-execution verification (code that controls the account) does not exceed some fixed amount of gas, so it's crucial to pay attention on the cost of BLS12-381 operations and the only reasonable approach is to use native code (precompile) and eventually include BLST into all major clients.