mev-boost community call #0 recap

--- tags: mev-boost-community-call --- # mev-boost community call #0 recap Recording: [https://www.youtube.com/watch?v=9LTP6rC-HNA](https://www.youtube.com/watch?v=9LTP6rC-HNA) # Summary - First MEV-Boost Community Call. Goal of these is to discuss MEV-Boost, relays, builders, upcoming hard fork (Capella), and broadly where all of this is heading. Future calls will be likely supply driven. - Changes to the `builder-specs` and `relay-specs` are in progress for Capella. Main change is the updated types to support new formats, e.g. added withdrawals. - Testing MEV-Boost software stack is an involved process, especially around hard forks and the fork boundary. Some ideas are proposed where automated testing can integrate relevant binaries to reduce the load on the operators. - Discussed the Optimistic Relay proposal from Ultrasound relay. Rollout is first on Goerli and then on mainnet with a handful of select builders who have expressed interest in being early testers / adopters. # Capella ### Specs To kick off, the discussion is on the `builder-specs` and updating these for Capella. The builder spec is a core API that CL clients are using to talk to the builder network. Changes for Capella consist of updating types to support new formats, e.g. withdrawals in blocks for Capella. Another similar repo is `relay-specs`, maintained by Flashbots and is similarly a swagger spec. Main change here is in block submission APIs to support either type of block, e.g. Bellatrix or Capella. Generally, the main implementation of the spec is the Flashbots relay. If you want to make a conforming relay, this spec is the place to start. ### Looking ahead to Deneb There’s also another PR for Deneb fork (4844 changes). It’s a bit more involved since there are changes w.r.t builder APIs with blobs. It’s also still WIP since there are still design changes happening around how the blobs are being passed around. Can take some time to design and changes here will come downstream from whatever decisions the CL teams end up making. ### Readiness Next are some discussions around Capella hard fork readiness. At a high level, there are changes needed across the entire stack. From a relay perspective, the key thing is relays being aware and knowing that they have to do stuff for this hard fork. (Chris from Flashbots gives an overview of Capella changes across the stack). There are changes to a lot of layers, including the relay, builder, validation node, MEV-Boost, and also to the CL client. As of time of call, on the CL client side this is the Flashbots Prysm fork. For example, one change here is to add another endpoint for withdrawals so that the relay can validate and filter out any blocks with invalid withdrawals. As of the time of call, MEV-Boost relay depends on the FB Prysm fork with this endpoint. There is work on an SSE subscription w/ SSE event standardized across clients. After that any CL client can be used to drive the relay. Longer-term, with future hard forks requiring further changes the idea is to have a single SSE endpoint to link everything together instead of adding more endpoints. The main thing is having relays have access to relevant information, so it makes sense to standardize the SSE endpoint. There also is the aspect of performance. So far RANDAO and withdrawals are cheap for CL to calculate, so no big cost to be doing this, but perhaps with something like blobs there is more data. Ideally, the endpoint remains lightweight. All changes have been run through the Zhejiang testnet and all changes are pretty complete. Most work is polishing and merging as of time of call. Next steps are to participate in the Sepolia and Goerli upgrades. It would also be great to have more relays participate. Ideally more relays are ready for Sepolia and Goerli for hard fork testing. Ultrasound relay is currently running on Goerli. If Sepolia goes well, Goerli will probably happen within two weeks or so after that. Blocknative will be testing on Goerli, and is not running on Sepolia. Will be paying attention to Goerli fork date. (Chris gives an additional update on Zhejiang). Was a big push and Flashbots put the entire software stack there. Now in a place where there are no errors and everything works post-fork. Next step is to test the actual before/after the fork (boundary). Overall, was very useful to test everything. On the topic of testing the boundary, Alex mentions that EF has been working on hive support, for example, already have been testing the circuit breaker in hive. (Hive is an automated client testing framework). From that got some results around how clients behaved around the circuit breaker. Perhaps can have testing for relays under the fork scenarios, since the hard part to test seems to be the transition. Generally there is agreement that it would be valuable to have automated testing, starting in hive as the foundation. ### mev-rs Alex has been working on a Rust implementation of essentially the MEV-Boost stack. It’s still pretty early and a bit bandwidth constrained currently. Part of this is building a “mempool builder”. The idea is to emulate through the builder APIs using the mempool of local execution client. Some clients are already interested in using this for testing, e.g. MEV changes are done ahead of time, then allowing do a bunch of testing on CL side. ### Relay testing Terence from Prysm raises a question around relay readiness for Goerli & for testing going through the fork and whether the code not being ready is the main blocker. Ultrasound team is very constrained from an HR standpoint. One issue with testing is that in order to be effective the relay needs builders to connect. Ultrasound have already spent two month convincing builders. Could have dummy builders, but that also adds to complexity where now would be in charge of setting up both. Also need to communicate with validators. Chris mentions that the Flashbots builder can run on any testnet, can send fake bundles, and can run as both a block builder and a block validator. Even though this is as easy as gets, it’s still as involved as running a Geth instance. But doable to setup everything to get winning blocks. Justin’s point is that capacity-wise it’s a lot of work for testnets. One idea is maybe to coordinate with relays to point to releases and start including relay binaries in automated testing. Additionally maybe a focused call on relay testing. At Blocknative, folks would be happy to work with a testnet to try it out. Though they also mention that there is a certain amount of resources needed to spin up relays / builders / bundle injection for full pipeline. Readiness-wise, the Dreamboat relay (from Blocknative) is looking internally at what they need to accommodate for Capella but haven’t committed any code yet. Seems there are two distinct things: 1) Relay operators have a software stack. If there is a software artifact, how do we test it? 2) Relay operators will have full deployments, how do we test these? It’s tricky to tell operators to spin up the stack for every testnet on every fork. Perhaps as long as relays can produce the software they will run can handle the rest via automated testing. (Justin asks if it’s going to be possible to see a list of people operating relays for Sepolia). Understanding is that will have to check per operator. Definitely not Dreamboat. Would be nice for future hard forks with lots to do for automated testing. Can run a relay on Sepolia, but then need people use it. One distinction between testnets is that the Sepolia validator set is closed vs. Goerli. This is deliberate for various testing use cases (for example, Sepolia for application testing and Goerli for staking). If anyone really wants to test on Sepolia, most people know the validators, so it’s possible to arrange for it. ### Local path for block construction Terence brings up that post-Capella clients have the option to go to local path for block building and explicitly compare value of block from remote (MEV-boost) vs. local, where they will now know the effective bid of the block. Relays should be aware of this to avoid potential surprises, with clients able to pick the higher value local block. The CL client does the decision making between MEV-Boost remote vs. local block. The min-bid feature in MEV-Boost still retains it’s own functionality. So altogether, need a remote bid above the mid-bid limit to make it into CL and now also have the local bid to compare to the remote. # Optimistic Relay Mike is the main person working on Optimistic Relay (OR). Made a PR against Flashbots MEV-Boost relay repository, primarily for visibility. Upstreaming / merging still under discussion. ### Elevator pitch for OR Optimistic Relaying changes how block submission flow works. Currently, blocks are submitted by builders and get validated by the relay via execution of those blocks against dedicated Geth validation nodes. After checking execution, the blocks get marked as valid / eligible to win the block auction. Importantly, this needs to happen before `getHeader()` calls because `getHeader()` returns the max bid that the relay is aware of. So, if a bid is not marked as eligible it will not get picked up by `getHeader()`. This comes into play since it’s been observed that the most valuable bids, and therefore the submissions that would end up winning the auction, come right before the deadline of the slot. The change with OR is to mark bid submissions as eligible immediately and validate later. Bids become eligible for `getHeader()`calls immediately and get queued up in a separate goroutine for validation. This helps with the case of receiving blocks close to the block deadline, where the late blocks have more MEV. Again, the issue with non-optimistic relaying is that the block simulation takes time (call from relay to validation node + time to execute the block) so these late blocks basically don’t get marked as eligible in time. With OR this is no longer an issue and so we get blocks with more MEV for builder and for proposer. The main issue with this is, is a block that gets submitted but that is invalid. Since the block submission might not get simulated in time, it may be picked by `getHeader()` and lead to a missed slot. The proposed solution is using collateral posted by builders that incentivizes proper behavior. The Ultrasound team already did some work to understand time saving, specific to the Ultrasound relay. ### Refunds to proposer A question is brought up on the topic of making proposers whole, since when outsourcing to the external builder network, there is a risk of an invalid block being delivered. With optimistic relaying, this can happen as explained earlier. As far as the validator is concerned, how does the relay recover from reputation damage if or when builders fail to provide a valid blocks and/or lead to a missed slot? (Justin and Mike provide thoughts). At a high level, Ultrasound thesis is that the reputation of builders is extremely high, so in the event of builder errors, builders will be incentivized to pay proposers instead of essentially running away. The relay itself can, if needed, refund the proposer since it is given the fee recipient and the validator registration messages. Since already have had refunds to proposers due to bugs, don’t think there have been issues giving refunds. Worth highlighting that proposers will get compensated with a delay. Goal with Ultrasound is to make a refund within 24hrs. In terms of reputation damage to the relay itself, the idea is that it should be minimal. With sufficient education within community, the idea is that there will be an understanding that the relay was not at fault, but the builder. Ultrasound thinks that they can refund a proposer promptly and the goal is to be transparent with the refunds. Suggested having a public log exposing bad blocks / faults, perhaps somewhere on GitHub or similar. As an operator will try to minimize bad events. But if and when there is a bad event, first and foremost ask the builder to fix the bug that caused the issue. Then, potentially have a timeout / cooldown period for the builder on OR, e.g. one day cooldown from optimistic relaying which will have the added effect of more losses for builder due to not winning auctions. Finally maybe can have an additional fixed fee. Still, generally don’t expect to happen often. Overwhelmingly likely that it will not lead to a loss for a proposer. Builders produce lots of blocks, so chances are invalid blocks will lead to a demotion very quickly. For example, suppose builders provide 50 blocks in slot, total can expect something like 1000 submissions per slot. Even if one of these is invalid, will lead to builder being demoted. So, since demotion is likely even without getting to the winning bid, don’t expect to be in a situation to be giving refunds. ### Performance Unblinding of block still works the same. One of the bottlenecks is receiving / decoding blocks over the network. Unmarshalling the message into a Go object takes a while. Some optimizations will be coming around parsing headers first and marking the bid as eligible even faster. In this design, the relay takes on additional risk of not receiving the block body from the builder. Therefore, will be a follow-up performance improvement since some added complexity exists. Terence asks how you would go about refunding attestation rewards. The idea is the same in that this will not cause more missed slots. Justin: in the worst case, one missed slot per builder per manual intervention. Each would be a single isolated missed slot. Empirically currently see on the order of 10 orphan blocks, which provides zero network degradation. Here it’s several orders of magnitude less degradation. Finally, in the grand scheme of things, the chain will still keep running. ### Rollout Alex asks around rollout strategy. Is the idea to merge into Flashbots relay? Make it opt-in or enable by default? The current status is running on Goerli. The idea is to move to production (mainnet) with a few trusted builders. The PR against Flashbots repo is made for visibility reasons. Will talk about whether or not this is something to upstream now or later after running in production. Definitely will be opt-in. Will start will small set of builders, small amount of collateral. Then will see how it runs and learn. Alex points out that the relays right now are either exactly the Flashbots relay or minimal forks of that, so that’s one thing to consider around merging things into the repository. From the Ultrasound perspective, the expectation is that it’s a race to zero, i.e. if one is not running optimistic relay, they become not relevant as a relay. Has asked on the FB Discord if builders want to be in the early batch of testing and are keeping collateral low to keep accessible. Something to consider is that smaller relays might want to adopt this but holding collateral might pose additional challenges. Ultrasound has been exploring the role of a guarantor. Basically allows to run the relay without collateral, using reputation at stake or some legal contract. One of the things Ultrasound observed is that builder reputation is worth a lot. For example, reputation translating into bundle flow and proprietary transaction flow. In terms of collateral, believe that reputation is worth order of magnitude more than 1 ETH. So, the alternative setup is willing to not take collateral from identifiable builders. Then, if and when a bad block happens, believe that those builders will refund the proposer directly, without intermediary. If not, guarantor would step in and make the proposer whole. Another way to look at this is trusting the builder with own funds (deposit the collateral). One thing to note regarding high-value blocks, is that if the value of the block exceeds the collateral, it is simulated as before. Ideally want to limit the amount of collateral for builder. If looking at the MEV distribution, the vast majority of blocks have MEV < 1ETH, somewhere on the order of 95%. Ultrasound wants to improve censorship of Ethereum so 95% is good enough. The rest can go through pessimistic simulation. In terms of fixing the collateral amount, perhaps could have a policy to cap the amount to target 95% percent of block value. # Future calls Future calls will be supply driven, maybe monthly. Encourage to chime in Flashbots forum post and suggest to have another call before the Goerli hard fork.