op-batcher concurrent alt-da requests design doc

# op-batcher concurrent alt-da requests design doc Design doc to be submitted to OP Labs, either unofficially in their discord server, or officially in their [design-docs](https://github.com/ethereum-optimism/design-docs) github page. Main authors: - Samuel Laferriere (EigenLabs): samlaf@eigenlabs.org - Bowen Xue (EigenLabs): bxue@eigenlabs.org - Karl Bartel (Celo): karl.bartel@clabs.co Suggested Reviewers: - OP Labs team (not sure who exactly) - Base team (Robert Bayardo, Michael de Hoog) - Celo, MegaETH, Mantle, Polymer team # Timeline and TLDR Our main goal is to get the [main proposed change 1](#1-Main-fix-run-da-server-calls-in-goroutines-implemented) upstreamed asap, ideally before the end of September. The rest of this doc contains other additions, features, or refactors that we deem would be nice to have for maintainability and extensibility of the code, but don’t have any clear deadline by which to see them implemented or merged. TLDR: 1. [PR](https://github.com/ethereum-optimism/optimism/pull/11698) for [concurrent da server requests](#1-Main-fix-run-da-server-calls-in-goroutines-implemented) (ready for review, make sure to look at the open questions section below) 2. optional [Proposal](#2-Multi-commitment-txs-not-implemented---future-nice-to-have) for adding multi-commitment txs (but not priority for us at the moment) 3. optional (wip) [PR](https://github.com/Layr-Labs/optimism/pull/9) for [event loop refactor](#Suggested-Event-Loop-Refactor) which we deem would improve code readability # Glossary - `alt-da`: system external to ethereum that is used for storing rollup batches - `eth-da`: term used to distinguish information on alt-da from that on ethereum. eth-da can either contain rollup batches directly (in normal mode), or commitments to alt-da blobs (containing rollup batches) # Problem Summary Many rollup teams are using op-batcher in alt-da mode and posting data to eigenDA. The problem they are running into is that the write to the DA layer is blocking, and done sequentially in the main driving loop. Writes to eigenDA currently take 10+ mins to return because blobs are batched before getting dispersed to nodes. This then blocks the pipeline for that amount of time, limiting the rollup’s throughput to `blob size / request_time`. The main idea that this design doc wants to address is allowing publishing multiple blobs simultaneously to the da-layer, the same way that the [maxPendingTxs](https://docs.optimism.io/builders/chain-operators/configuration/batcher#max-pending-tx) ≠ 1 (0 means no limit) allows submitting multiple eth-da txs to ethereum. # OP-Batcher Context The op-batcher, at a very high level, [polls](https://github.com/ethereum-optimism/optimism/blob/db21f4aa016c536e4602b7b2f5effa4ac8cf35ba/op-batcher/batcher/config.go#L49) the sequencer periodically for new L2 blocks, transforms them into [batches](https://specs.optimism.io/protocol/derivation.html#batch-format), [adds those](https://github.com/ethereum-optimism/optimism/blob/db21f4aa016c536e4602b7b2f5effa4ac8cf35ba/op-batcher/batcher/driver.go#L357) into [channels](https://specs.optimism.io/glossary.html#channel), which are then compressed, and eventually split into [frames](https://specs.optimism.io/protocol/derivation.html#frame-format) of size ~128KB, and frames are sent to Ethereum using one of two types of transactions: 1. calldata transaction: sends 1 frame as calldata 2. blob transaction: sends up to 6 frames (currently 5 is used) as blob payloads [alt-da mode](https://specs.optimism.io/experimental/alt-da.html) allows posting frames to an alt-da layer (eigenda, celestia, avail, any DAC, etc.) instead of ethereum, and only posting the commitment to the data on ethereum. The current implementation only allows packing 1 commitment into a single calldata tx. | | only eth-da | alt-da + eth-da | | --- | --- | --- | | calldata tx (128KB) | 1 tx = 1 frame = 128KB | 1 tx = 1 commitment of frame (of arbitrary size) | | blob tx (≤6*128KB) | 1 tx = ≤ 6 frames = 6 * 128KB | N/A | Most of the op-batcher codebase nomenclature (flags, variables, function names, etc.) are very ethereum-da centric. An ideal solution would decouple alt-da from ethereum-da, and allow flexible combinations of both, filling the above matrix. That is, multiple alt-da commitments could be submitted in a single calldata, or possibly even blob tx, whichever is cheapest. ## Current Architecture ![IMG_0B731E27080C-1](https://hackmd.io/_uploads/BJv5Sjk2R.jpg) The current [op-batcher](https://github.com/ethereum-optimism/optimism/tree/develop/op-batcher/batcher) code, at a high-level, is structured as two main goroutines: 1. main “event” loop, which on every poll interval (only event in the main loop): 1. gets all latest blocks (except on error) from sequencer and [loads](https://github.com/ethereum-optimism/optimism/blob/f20b92d3eb379355c876502c4f28e72a91ab902f/op-batcher/batcher/driver.go#L357) them into channelManager 1. on reorg, publish all currently stored blocks to L1, before continuing (needed in case chain re-reorgs back to current head) 2. other errors are silently ignored 2. In a [sequential](https://github.com/ethereum-optimism/optimism/blob/f20b92d3eb379355c876502c4f28e72a91ab902f/op-batcher/batcher/driver.go#L430) for loop, until all blocks in channelManager have been processed (or error happens): 1. If pendingChannels have enough unsubmitted frame(s) to fill an L1 tx (whether calldata or blob), output the [frame(s)](https://github.com/ethereum-optimism/optimism/blob/f20b92d3eb379355c876502c4f28e72a91ab902f/op-batcher/batcher/channel_manager.go#L168) and [submit](https://github.com/ethereum-optimism/optimism/blob/f20b92d3eb379355c876502c4f28e72a91ab902f/op-batcher/batcher/driver.go#L506) tx to L1 any 2. [create](https://github.com/ethereum-optimism/optimism/blob/f20b92d3eb379355c876502c4f28e72a91ab902f/op-batcher/batcher/channel_manager.go#L178) new channel if current channel is full 3. [add](https://github.com/ethereum-optimism/optimism/blob/f20b92d3eb379355c876502c4f28e72a91ab902f/op-batcher/batcher/channel_manager.go#L182) as many blocks as fit into the current channel 4. compress channel data and [output](https://github.com/ethereum-optimism/optimism/blob/f20b92d3eb379355c876502c4f28e72a91ab902f/op-batcher/batcher/channel_manager.go#L191) as many frames as possible 5. return enough frame(s) to fill an L1 tx and [submit](https://github.com/ethereum-optimism/optimism/blob/f20b92d3eb379355c876502c4f28e72a91ab902f/op-batcher/batcher/driver.go#L506) tx to L1 3. when alt-da flag is [turned on](https://github.com/ethereum-optimism/optimism/blob/3542398896d9faca6b379fe67e3985d722cf80b6/op-batcher/batcher/driver.go#L573), data is submitted to the alt-da server, and the [commitment](https://github.com/ethereum-optimism/optimism/blob/3542398896d9faca6b379fe67e3985d722cf80b6/op-batcher/batcher/driver.go#L583) is submitted to Ethereum. 1. Note that “DA Publisher” in the image doesn’t actually exist, submission to alt-da and to tx queue is all currently done as part of [sendTransaction](https://github.com/ethereum-optimism/optimism/blob/3542398896d9faca6b379fe67e3985d722cf80b6/op-batcher/batcher/driver.go#L553) 2. tx receipts listener, which 1. deletes channels after all of their frames have successfully been submitted 2. queues frames for resubmission when errors happen # Proposed Changes This section contains proposed changes related to concurrent alt-da blob submissions. Change 1 is the main and most important one, and also the simplest. Change 2 is a nice to have, but is more involved and not currently needed, so could be added later. We have implemented change 1 but not change 2. ## 1. Main fix: run da-server calls in goroutines (implemented) ![IMG_FA239C50BEED-1](https://hackmd.io/_uploads/SymhriJh0.jpg) Idea is implemented in this [PR](https://github.com/Layr-Labs/optimism/pull/7). The main idea is just to use an errgroup for the da publishing, just like the one used for tx publishing. ### Implementation Details In order to not affect the current normal (non-alt-da) execution path, we only run alt-da requests in a goroutine [inside the sendTransaction](https://github.com/celo-org/optimism/blob/fede4453000981bccc2147223a258bae82a61e2a/op-batcher/batcher/driver.go#L504) function. This changes the behavior of the [for-loop](https://github.com/ethereum-optimism/optimism/blob/7373ce7615b3607b328ad22672b1adb3c7cf3701/op-batcher/batcher/driver.go#L429) inside publishStateToL1: - previously, a da server request failing would break the for loop, making sure frames are sent in order to the L1 - now, since requests are sent in goroutines, `sendTransaction` returns nil after spawning a goroutine and lets the for loop continue and send all available frames (up to a maximum number of running goroutines). Thus, frames could be sent out of order if one goroutine errors. This is not a problem because the derivation pipeline can deal with this, but if for some reason we didn’t want this behavior we could use `errgroup.WithContext()` and cancel all running goroutines when one fails. Another thing to note is that txs that fail at the txmgr, will have their frames put back in the channel, and resubmitted to the alt-da server. We are assuming that the da server uses caching to prevent double submissions to the DA layer. ### Testing We added a basic [e2e test](https://github.com/Layr-Labs/optimism/pull/7/files#diff-8bc46de6e90dc6a2a1773e2a90eca15e631d5b7e062fc2f282371082faa30e52R1406) by using a fakeDaServer that sleeps for 5 seconds before returning commitments, in order to simulate slow EigenDA responses. We also tested this manually by running the devnet and submitting large txs and observing the behavior. ### Stop/Shutdown Behavior One of the open questions below deals with how we want to deal with stops and shutdowns wrt pending da requests (which might take long time to return). For background, Let’s first look at how pending eth txs are dealt with. The driver has a `l.shutdownCtx` and `l.killCtx` which are used in different places. The txmgr queue is [created with `l.killCtx`](https://github.com/ethereum-optimism/optimism/blob/f370113e8de436ad2af5dd093b45366bc8ab352f/op-batcher/batcher/driver.go#L304) , so that shutdown first has some time to flush all txs to ethereum, before cancelling the killCtx. As for the main batchSubmitter loop, there are 2 different paths/reasons to [stop the main loop](https://github.com/ethereum-optimism/optimism/blob/f370113e8de436ad2af5dd093b45366bc8ab352f/op-batcher/batcher/driver.go#L143): 1. stop batchSubmitting but keep the batcher service running: via a [stop-batcher](https://github.com/ethereum-optimism/optimism/blob/f370113e8de436ad2af5dd093b45366bc8ab352f/op-batcher/rpc/api.go#L41) admin api call (main loop can later be restarted using the equivalent start-batcher admin api call) - in this case shutdownCtx is first cancelled, and only after the main loop exits is killCtx canceled. This gives time for all txs to get flushed to ethereum. 2. via a [normal shutdown](https://github.com/ethereum-optimism/optimism/blob/f370113e8de436ad2af5dd093b45366bc8ab352f/op-batcher/batcher/service.go#L386) (probably from receiving a SIGTERM or SIGINT) - in this situation, [txmgr.Close() is called](https://github.com/ethereum-optimism/optimism/blob/f370113e8de436ad2af5dd093b45366bc8ab352f/op-batcher/batcher/service.go#L395), which abruptly closes the eth client connection, and cancels all pending txs (see this [PR](https://github.com/ethereum-optimism/optimism/pull/8694) for background). ### Open questions - How to differentiate between stops and shutdowns for da requests? txmgr uses a separate txmgr.Close() function to quickly kill pending txs during shutdown. We might want to have a similar mechanism for DA requests? But in the meantime, we’ve opted to cancel all pending DA requests for both closes and shutdowns, by using `l.shutdownCtx` (see [here](https://github.com/Layr-Labs/optimism/pull/7/files#diff-c734d1296b2fd691221b92df3edf09c7533c507a74c2316117745c75c3ad5776R584) and comment above) instead of the currently used `l.killCtx` . This seems fine since the txData is [pushed back into the channel on error](https://github.com/Layr-Labs/optimism/blob/81cf5db4ce271672a08004fee27299a953b1925d/op-batcher/batcher/driver.go#L589), and would be resubmitted on a later start(). - errgroups for alt-da and eth-da are different: should we combine them into a single goroutine that does both calls serially. This would also mean combining their max concurrent goroutine flags/parameters ([MaxPendingTransactions](https://github.com/Layr-Labs/optimism/blob/d78ffa9ad7f0449305ff2f8586e55ccf5616e2f0/op-batcher/batcher/service.go#L37) and [MaxConcurrentDARequests](https://github.com/Layr-Labs/optimism/blob/d78ffa9ad7f0449305ff2f8586e55ccf5616e2f0/op-batcher/batcher/service.go#L43)). - How do we feel about changing the [MaxL1TxSizeBytes](https://github.com/Layr-Labs/optimism/blob/d78ffa9ad7f0449305ff2f8586e55ccf5616e2f0/op-batcher/flags/flags.go#L73) flag/param to MaxFrameSizeBytes to better represent its dual use in the case of alt-da, where the actual l1 tx size is the size of the commitment, not of the frame? ## 2. Multi-commitment txs (not implemented - future nice to have) > This is not implemented, and not currently needed. > Channels were created to solve the blocks → eth-da calldata/blobs packing problem, in the sense that a single L2 block can be spread over multiple frames, and possibly multiple l1 tx. With alt-da, channels are repurposed to pack alt-da blobs (which can be much larger). Each blob commitment is then currently submitted as a single eth-da tx. An eigenDA “commitment” is ~450 bytes, which means at least 250 of them fit in a single eth-da tx. This would be similar to how up to 6 frames can fit in a single 4844 tx by sending each frame as an attached blob. Here a single eth-da tx would contain multiple alt-da commitments. Eventually, if calldata gas costs become very expensive, alt-da commitments could even be sent via an eth-da blob tx. One complication with this approach is that it requires changing the derivation pipeline to be able to understand this packing. Celestia maintains a [fork of op-stack](https://github.com/ethereum-optimism/optimism/compare/develop...celestiaorg:optimism:celestia-develop?diff=split&w=) which supports multi-frame txs for celestia DA. They forked from the [DA server API](https://specs.optimism.io/experimental/alt-da.html#da-server) by having endpoints that submit multiple blobs in parallel. This might be a nice change to add to the DA server API, but we could also get the same result by making multiple parallel requests to the DA server. # Suggested Event Loop Refactor > NOTE: this refactor is actually orthogonal to concurrent alt-da requests. Either, or both, can be implemented. > This section describes a refactor of the main driver loop that we deem would help the readability and maintainability of the code, but is not strictly necessary to implement the concurrent alt-da requests. See this [PR](https://github.com/Layr-Labs/optimism/pull/9) for an initial implementation of this idea. If we look at the main architecture diagram, we see that both the main loop goroutine, and the receipt processing goroutine, talk to the channelManager. This is why the channelManager has a [mutex](https://github.com/ethereum-optimism/optimism/blob/2b128c70c7e2e083bea847506fc4da2a6f3a76ab/op-batcher/batcher/channel_manager.go#L28), and that most of its methods [lock](https://github.com/ethereum-optimism/optimism/blob/2b128c70c7e2e083bea847506fc4da2a6f3a76ab/op-batcher/batcher/channel_manager.go#L153) for the method’s entire duration. Furthermore, the event loop blocks on [queue.Send](https://github.com/ethereum-optimism/optimism/blob/2b128c70c7e2e083bea847506fc4da2a6f3a76ab/op-batcher/batcher/driver.go#L601) in normal-mode, as well as on put requests to the alt-da server. Even if using an errgroup, these calls will block once the goroutine [limit](https://pkg.go.dev/golang.org/x/sync/errgroup#Group.SetLimit) has been reached. Thus, if we want to add any other processing to the event loop (monitoring? background jobs? etc), those might not run in a timely fashion. We could bring the receipt processing goroutine back into the event loop, and make it an actual event loop. This way we could get rid of the mutex in the channelManager, and the da/tx publishing could be done in “task” goroutines. The main driver loop can now be unblocked by submitting the data over a channel in a select statement. ![IMG_BA1B20A54425-1](https://hackmd.io/_uploads/H1Rnrjk3A.jpg) We were originally worried that block processing and channel compression would take very long and hence would be hoarding the CPU and not letting the event loop process tx receipts. However, a pprof profile run (admittedly on devnet op-batcher however) shows that, when submitting 128 KBs of txs every 2 seconds (to make each L2 block fill up an entire frame), a 20s profile only saw 350ms of processing (1.75% CPU utilization). ```jsx File: op-batcher Type: cpu Time: Aug 26, 2024 at 9:32pm (PDT) Duration: 20s, Total samples = 350ms ( 1.75%) Entering interactive mode (type "help" for commands, "o" for options) (pprof) top Showing nodes accounting for 220ms, 62.86% of 350ms total Showing top 10 nodes out of 154 flat flat% sum% cum cum% 50ms 14.29% 14.29% 50ms 14.29% encoding/json.stateInString 40ms 11.43% 25.71% 60ms 17.14% encoding/json.(*decodeState).skip 20ms 5.71% 31.43% 50ms 14.29% compress/flate.(*compressor).deflate 20ms 5.71% 37.14% 30ms 8.57% compress/flate.(*decompressor).huffSym 20ms 5.71% 42.86% 40ms 11.43% encoding/json.checkValid 20ms 5.71% 48.57% 20ms 5.71% runtime.memclrNoHeapPointers 20ms 5.71% 54.29% 20ms 5.71% runtime.memmove 10ms 2.86% 57.14% 20ms 5.71% bufio.(*Reader).Peek 10ms 2.86% 60.00% 10ms 2.86% bufio.(*Reader).ReadByte 10ms 2.86% 62.86% 10ms 2.86% compress/flate.(*huffmanBitWriter).generateCodegen ``` Submitting 1 MBps resulted in 30% CPU utilization, which still feels reasonable. ```jsx Duration: 20.11s, Total samples = 5.91s (29.39%) Entering interactive mode (type "help" for commands, "o" for options) (pprof) top Showing nodes accounting for 4520ms, 76.48% of 5910ms total Dropped 227 nodes (cum <= 29.55ms) Showing top 10 nodes out of 255 flat flat% sum% cum cum% 1260ms 21.32% 21.32% 1260ms 21.32% runtime.memmove 960ms 16.24% 37.56% 960ms 16.24% runtime.memclrNoHeapPointers 610ms 10.32% 47.88% 610ms 10.32% encoding/json.stateInString 490ms 8.29% 56.18% 820ms 13.87% encoding/json.checkValid 310ms 5.25% 61.42% 310ms 5.25% runtime/internal/syscall.Syscall6 290ms 4.91% 66.33% 880ms 14.89% compress/flate.(*compressor).deflate 260ms 4.40% 70.73% 470ms 7.95% encoding/json.(*decodeState).skip 120ms 2.03% 72.76% 120ms 2.03% encoding/hex.Encode 110ms 1.86% 74.62% 120ms 2.03% compress/flate.(*decompressor).huffSym 110ms 1.86% 76.48% 170ms 2.88% encoding/json.appendCompact ```