How to calculate a more accurate L1 fee.
Special thanks to Roberto Bayardo for his excellent compression analysis repo and ideas on linear regression.
The current post-Regolith, pre-Ecotone cost function is defined as follows:
where:
zeroes
: count of 0 bytes in the txones
: count of non-0 bytes in the txoverhead
: overhead per tx (suggested: 188
)scalar
: scalar per tx (suggested: 684_000
)l1BaseFee
: current L1 base feeThis is a basic assumption of EIP-2028 costs multiplied by a constant scalar that represents the compression ratio over time.
Ecotone will update it to:
where:
zeroes
: count of 0 bytes in the txones
: count of non-0 bytes in the txl1BaseFeeScalar
: scalar to use if submitting calldata`l1BaseFee
: current L1 base feel1BlobBaseFeeScalar
: scalar to use if submitting 4844 blobsl1BlobBaseFee
: current L1 blob feeWhile the scalars in these formulas give some macro control over the economics of the chain, it doesn't take into account the compressibility of transactions. For example, a tx with calldata that contains 1000 repeated 1
values will be charged more than a transaction that contains 500 random bytes, even though the former is more compressible than the latter, and costs the chain operator less to rollup.
This starts to incentivize pre-compression of transactions submitted to L2, as we are starting to see with 4337 transactions. On L2s execution gas is cheap and calldata is expensive, so folks are starting to use libraries like solady's LibZip to save on L1 fees. This behavior will eventually cause the chain operator to increase the scalar value to cover the cost of these less compressible transactions, essentially forcing other users to cover the DA cost increase. It will also increase L2 gas usage per block, reducing the overall chain transaction throughput.
What if we could provide a better estimate for the actual cost a transaction costs to rollup? It is tricky to get a perfectly accurate estimate, because the L1 cost function is run with only the transaction as input, and in reality the batcher is compressing many transactions together in a batch which gives better compression ratios.
However, we can do better than the naive EIP-2028 approach above. We can introduce an efficient compression estimator into the fee calculation that can give a more accurate estimate of the actual costs to roll up a transcation.
This compression estimator needs to be efficient as we don't want to slow down the execution engine with compression tasks. It also needs to have an implementation available in Solidity so we can keep the GasPriceOracle calculation in sync with the execution engine implementation.
FastLZ is a "Small & portable byte-aligned LZ77 compression" algorithm. The advantage of this algorithm is that it is simple and efficient, and has an audited Solidity implementation in the solady
library.
We compared FastLZ with running individual transactions through zlib at max compression, and while zlib results in better compression ratios, we found it did not perform dramatically better for compressibility estimation. Both FastLZ and zlib use a variation of LZ77. Given the batcher uses zlib, it's unlikely that some other compression algorithm will improve accuracy, so FastLZ seems to be a good choice for this use case.
There's a reference implementation of FastLZ in Golang here, and a more efficient length-only version here (we only need an estimate of the compression length, not the actual compressed data).
We wanted to test the improved accuracy of the new L1 cost formula, so we took the following approach:
Note: we calculate the "best estimate" using a dual setup of zlib writers at the highest compression level. We aim to keep one of the writers always between 64kb and 128kb in size, simulating the compression achieved in a 128kb batch. Running all of the transactions on OP mainnet since Bedrock through this algorithm spat out an estimate of 19,929MB. The actual amount of calldata written is 20,222MB meaning the estimate is off by 1.5%, validating the method's accuracy.
Why RMSE? RMSE penalizes outliers more heavily, so it captures things like adversaries potentially trying to exploit weaknesses. We also looked at mean-absolute-error (MAE) and found that the two measures were correlated: estimators that performed well using RMSE also performed well against MAE.
You can see the RMSE of the FastLZ algorithm performs better than the EIP-2028 version. However we still see a lot of spikes in the chart. These spikes are spaced 14 days apart, and are likely due to Worldcoin activity on OP, which are inherently incompressible transactions.
We decided to apply a linear regression to the data to attempt to find a more accurate estimator. This has two advantages:
Using the "best estimate" above as the dependent variable, we chose to run three separate regressions over the data, using the following sets of independent variables:
[zeroes, ones]
(checking if using a linear regression using the EIP-2028 variables would improve performance without FastLZ)[fastlz_length]
[fastlz_length, uncompressed_tx_size]
We performed the regression over a subset of the data: transactions from the month of October, 2023, which includes some of the worldcoin distribution shift. We then generated RMSE charts across the entire dataset, which allows us to validate that the model still holds during the larger changes in the tx data.
The [fastlz_length, uncompressed_tx_size]
regression (in orange above) performed the best, almost wiping out those spikes in errors seen in the other charts.
The models calculated in these cases are as follows:
v = 8.570750093473123 - 0.01778946*zeroes + 0.87830521*ones
v = -35.60342890895569 + 0.90558897*fastlz_length
v = -27.321890037208703 + 1.03146206*fastlz_length - 0.08866427*uncompressed_tx_size
We wanted to validate some other techniques that didn't involve FastLZ, in case there was a simpler implementation that would yield good results. A few variations were tested:
You can see a full list of the techniques we tried here. While some of these improved on the original algorithm, we didn't find any that performed better than the FastLZ algorithm.
We recommend switching to the following formula for the new L1 cost function:
where (most variables scaled by 1e6
for integer arithmetic purposes):
l1BaseFeeScalar = 11_111
(~= 7600/0.684
)l1BlobFeeScalar = 1_250_000
(~= 862000/0.684
)l1BaseFee
: current L1 base feel1BlobBaseFee
: current L1 blob feeintercept = -27_321_890
: intercept (a.k.a. constant) calculated by the linear regressionfastlzCoef = 1_031_462
: coefficient of the fastlzLength
termfastlzLength
: length of FastLZ-compressed RLP-encoded signed txuncompressedTxCoef = -88_664
: coefficient of the uncompressedTxSize
termuncompressedTxSize
: length of uncompressed RLP-encoded signed txThe l1BaseFeeScalar
/l1BlobFeeScalar
will give the chain operator some ability to tweak the rollup costs should the tx traffic shape change. They are calculated from the L1 fee scalars doc here, divided by the assumed 0.684
compression ratio.
The linear regression calculation script can be found here (please excuse the lack of correct Python style).