# EIP-4844 Implementer notes [EIP-4844](https://eips.ethereum.org/EIPS/eip-4844) introduces "blob transactions", requiring changes in both execution and consensus layers. See [meta-spec](https://hackmd.io/@protolambda/eip4844-meta) for all resources. This doc is meant as an implementation guide, clarifying "implementation details" of the different specs. Initial write-up is focused at KZG performance, by @protolambda. ## KZG Learn what KZG is here: [KZG polynomial commitments, by Dankrad Feist](https://dankradfeist.de/ethereum/2020/06/16/kate-polynomial-commitments.html) In EIP-4844 we use this to commit to the "Blob of data". This is slower to compute than just hashing the data, but good for other things: - The proof size, of one or more values, is always 48 bytes (compressed G1 BLS12-381 point) - The polynomial commitment integrates very nicely with the data-recovery part of data-availability-sampling, required for full sharding (in the danksharding design) We hide the complexity from the transaction itself by hashing the KZG commitment: this way we can version and abstract it away, and change the hash pre-image to a different type of commitment (or with different parameters) later on. `versioned_hash = prefix_byte ++ hash(compress_serialize_g1(commit_to_poly(blob)))[1:]` The costly part is `commit_to_poly`, which is the evaluation of the polynomial at a secret point `s`. We compute this with a linear combination of `KZG_SETUP_LAGRANGE` and the blob. ### KZG Libraries Q: Do I need a whole library to support blobs? A: No! KZG is super simple. Just implement: 1. a linear combination to compute a KZG commitment 2. a single pairing verification to verify a KZG proof The bare-bones version: [geth datablobs prototype `crypto/kzg/kzg.go`](https://github.com/protolambda/go-ethereum/blob/datablobs/crypto/kzg/kzg.go). One feature specific to EIP4844 that the go-kzg libraries do not have (yet): batch-verification. More about that in the [Optimizations](#optimizations) section. Q: Are there libraries for the lazy buidler? Maybe I build fancy stuff on top of KZG, prototype sharding? A: Yes! Commitments, proofs, optimized multi-proof generation and data-extension and data-recovery for sampling are all there already. - [`go-kzg`](https://github.com/protolambda/go-kzg) by Proto, based on python research code from Dankrad. It has two bls options: - It defaults to use [`kilic/bls12-381`](https://github.com/kilic/bls12-381) for BLS. Go-ethereum [adopted an older copy of that for previous BLS precompile work](https://github.com/ethereum/go-ethereum/tree/master/crypto/bls12381). `kilic/bls12-381` is nice because it uses [Go assembler](https://go.dev/doc/asm) for it's super simple toolchain and portability (beware though, kilic BLS is not written for 32 bit). - It also supports [`herumi/bls-eth-go-binary`](https://github.com/herumi/bls-eth-go-binary/) a BLS library Prysm used early on. It's relatively slow, but there for comparison. - BLST is not supported, Go bindings are insufficient and buggy for integration (might revisit this in the future). - [`c-kzg`](https://github.com/benjaminion/c-kzg) by [Ben Edgington](https://twitter.com/benjaminion_xyz/), based on `go-kzg`. Using [BLST](https://github.com/supranational/blst/), a highly optimized BLS library used by all Eth2 clients today. Note: there also java bindings to c-kzg available. Others higher-level languages may also want to wrap the c-library. ### Benchmarks #### Blob to commitment The most essential benchmark is "time to go from blob to commitment", a.k.a. a G1 linear combination, with maybe some decoding/encoding work. In EIP-4844 we use 4096 field elements per blob, so that's what we benchmark. `log2(4096) = 12`, in the benchmarks we sometimes have different "scales", scale 12 is what matters here. ##### c-kzg ```shell git clone git@github.com:benjaminion/c-kzg.git git clone git@github.com:supranational/blst.git cd blst && ./build.sh cd ../c-kzg cp ../blst/libblast.a lib/ cp ../blst/bindings/*.h inc/ cd src make lib lscpu | grep "Model name" make kzg_proofs_bench ``` Collected outputs: ``` AMD Ryzen 9 5950X 16-Core Processor commit_to_poly/scale_12 42376608 ns/op ``` ##### go-kzg Using Go 1.17 ```shell git clone git@github.com:protolambda/go-kzg.git cd go-kzg # Kilic BLS (default) go test -run $^ -bench BenchmarkCommit -benchtime=1s -v # Herumi BLS go test -tags=bignum_hbls -run $^ -bench BenchmarkCommit -benchtime=1s -v ``` Collected outputs: ``` AMD Ryzen 9 5950X 16-Core Processor # Kilic BLS BenchmarkCommit/scale_12-32 198 57024504 ns/op # Herumi BLS BenchmarkCommit/scale_12-32 100 101043962 ns/op ``` Notes: - I think I can make Kilic BLS run faster, or at least thrash less memory, with Go-specific optimizations. But I am not a cryptographer, so I'm not making any changes without review :) - Both Go-KZG or C-KZG are single-threaded. With 40-60 ms it may be worth it to parallelize the work. - If you have a different machine, please run the same benchmarks and report the results, more data is nice to have! ### Optimizations 40-60 ms for computing a KZG blob is not great: - We have up to 2 per tx (`MAX_BLOBS_PER_TX`) and 16 blobs in a block (`MAX_BLOBS_PER_BLOCK`). Even if valid, a lot of work. - There is a whole transaction pool to verify. - Transactions may have valid blob data, but have an invalid KZG commitment or sign over a different versioned hash. The tx is invalid, and will not pay a fee. DoS risk there. We can optimize by doing two things: - By distributing the KZG commitment along with the input data, we only have to verify commitment and inputs match. - Similar to batch-verification of aggregated attestations in Eth2 to save cost, we can batch-verify the commitments of blobs. Description by [George](https://github.com/asn-d6): > Batch verification works with multi-scalar-multiplication (MSM): > The MSM would look like this (for three blobs with two field elements each): > ``` > r_0(b0_0*L_0 + b0_1*L_1) + r_1(b1_0*L_0 + b1_1*L_1) + r_2(b2_0*L_0 + b2_1*L_1) > ``` > which we would need to check against the linear combination of commitments: > ``` > r_0*C_0 + r_1*C_1 + r_2*C_2 > ``` > In the above: > - `r` are the random scalars of the linear combination > - `b0` is the first blob, `b0_0` is the first element of the first blob. > - `L` are the elements of the `KZG_SETUP_LAGRANGE` > - `C` are the commitments > > By regrouping the above equation around the `L` points we can reduce the length of the MSM further > (down to just `n` scalar multiplications) by making it look like this: > ``` > (r_0*b0_0 + r_1*b1_0 + r_2*b2_0) * L_0 + (r_0*b0_1 + r_1*b1_1 + r_2*b2_1) * L_1 > ``` Essentially we multiply each blob with a secret random scalar, as well as the commitment, so we can sum them up safely *before verifying*. And so it reduces to two `bls.LinCombG1` calls: - multiplying the aggregated blob elements with `KZG_SETUP_LAGRANGE` - multiplying the random scalars used during aggregation with the commitments ### Benchmarking Batch-verification Go 1.17, kilic BLS, [`kzg.VerifyBlobs` in geth prototype](https://github.com/protolambda/go-ethereum/blob/6e4051e1e347a00461d079a67eb3f7c856bc3afe/tests/kzg_bench_test.go#L28) ``` cpu: AMD Ryzen 9 5950X 16-Core Processor # 16 blobs: BenchmarkVerifyBlobs-32 20 71193890 ns/op # 128 blobs: BenchmarkVerifyBlobs-32 13 114886742 ns/op ``` I.e. from `57` to `71` milliseconds, but now verifying 16x as many blobs! And from `57` to `114` for 128x as many. *Batching is important* ### Memory optimization To aggregate with constant memory, the blob elements and commitments can be aggregated on the fly, i.e. we add the scaled blobs and commitments to running aggregates instead of aggregating at the very end. This is a trade-off between memory and being able to optimize the linear combinations. ## Tx Pool Ethereum fetches txs with the `devp2p` protocol, many at a time. However, in the case of go-ethereum it does not score down peers that return bad transactions (as far as I understood from quick reverse engineering), and it verifies the transactions single-threaded one at a time. Assuming the verification of regular txs is cheap, that's not really a problem. For blob transactions we have to make some changes though: - Refactor the processing of received txs to do one step at a time for all txs, instead of all steps for one tx. I.e. introduce "stages" - Introduce a "stage" that does batch-verification of all transactions that were just received from the same peer. If anything is wrong with any of these, the peer we received it from should be held accountable (peer scoring/banning). - If we still trust the peer when batch-verification fails, we can batch-verify the blobs of each individual transaction. To aggregate many transactions without increasing memory by too much aggregating on the fly should help (by parsing all blobs and commitments into deserialized `[]Fr` and `G1Point` instances we double memory). Staged validation, and the blobs verification stage, have been implemented in go-ethereum [here](https://github.com/protolambda/go-ethereum/blob/6f21779e7bc717ebb552625b7e76401b84745527/core/tx_pool.go#L1014). Peer-scoring was not changed, I need feedback on what is already there to integrate with. ## BlobsSideCar Similar to the transaction-pool in the execution-layer, the consensus-layer also has to verify bundles of commitments and blob-data: known as the `BlobsSidecar`. During gossip of sidecars however, only *signed sidecars* are distributed: we only have 1 beacon block per 12 seconds, and only 1 sidecar of blobs. So even if costly, the proposer signature check avoids DoS nicely. However, because blobs are available for longer times, but the sidecar signatures are not (and will definitely not be with sharding, when blobs are recovered from samples), there are still other places to DoS: the request-response (a.k.a. sync protocol). In that case we want to batch-verify all blobs in a single sidecar. And when requesting multiple sidecars from the same peer, using e.g. `blobs_sidecars_by_range` v1, we can *aggregate* all (or e.g. 20 at a time) received sidecars and batch-verify them all at once! Again, when batch-verifying a lot of data (100 sidecars = 200 MiB worst-case) it makes sense to *aggregate on the fly* to keep constant memory, and write the blobs to some temporary disk space until fully verified.