PoS Implementers’ Call #92 - 2022-07-28

--- tags: eth2devs description: Notes from the regular proof of stake [Eth2] implementers call image: https://benjaminion.xyz/f/favicon-96x96.png --- # PoS Implementers’ Call #92 - 2022-07-28 [Quick contemporaneous notes by Ben Edgington; fka "Eth2 Implementers' Call"] Agenda: https://github.com/ethereum/pm/issues/574 Livestream: https://youtu.be/XDfNg8mdC10 ## Merge [Pari] Two shadow forks. Goerli SF5: no big issues (but see Nethermind below). Have been testing MEV-Boost (Prysm, Lodestar, Teku) with ~30% of the network now, looking fine. Mainnet SF10: no client compatibility issues. Some Erigon nodes losing peers, Besu needed updating, but all more-or-less expected. Goerli SF6 planned for next week. ### Execution behaviour around terminal blocks [Mikhail] Related to GSF5. Issue showed up with Nethermind. Terminal block A was imported via gossip. After that a different terminal block B was received. Nethermind does not process blocks that won't become the head of the chain: thus added B to block tree, but did not process it. The transition block built on B which caused Nethermind to return "SYNCING" as it could not validate the payload. Node got stuck as it can't switch to the new branch for "safe slots to import" slots, i.e. 128 slots. The expected EL behaviour is that, whatever happens with multiple terminal blocks, if the EL has enough data to validate the transition block it must do so. Thus, Nethermind should have processed B here. Erigon has [a fix](https://github.com/ledgerwatch/erigon/pull/4812). Geth is expected to be ok. Besu will check. [Danny] We should add a note to the spec to execute all blocks on receipt near TTD even if it is expensive. Terminal blocks should be gossiped until the transition is finalised. **Action: Danny and Mikhail to clarify this edge case in the spec** ### Exchange configuration before TTD is set for Mainnet To allow people to set up their Merge configs (Beacon Node and Execution Client) ahead of time. [PaulH] It is not clear on the execution side what value to use for `TransitionConfigurationV1` before TTD is set. Consensus side has well-defined behaviour. See [here](https://github.com/ethereum/pm/issues/574#issuecomment-1197771890) for the values. Not clear whether to include this in the spec - probably not necessary. ### Extend optimistic node definition See [here](https://github.com/ethereum/consensus-specs/pull/2955). [Mikhail] Current definition: a node is optimistic when its head is optimistic. This PR addresses the case when you have a branch of optimistic blocks that justifies a new checkpoint, and the checkpoint is justified in the block store. In parallel the EL is catching up with the payloads. If any payloads are invalid we must remove the branch from the block store, which may leave us in a situation with no tip that matches the justified checkpoint in the store. There are various ways to deal with this. Rollback is dangerous (may lead to surround votes). Probably best to remain in optimistic sync and wait for further info from peers. We should have a general approach to dealing with this situation however it arises. [Potuz] Consensus clients currently have different approaches. Teku and Lighthouse might end up gossiping invalid blocks after a reboot. Prysm does has not decided on a solution. [Dankrad] Do we consider justified blocks from optimistic sync actually justified? [Danny] This situation really requires manual intervention to fix, and is a failure case resulting from an attack or breakage. Validators should not be voting on something that is optimistic. [Mikhail] It would be ideal not to rely on slashing protection - a node in this state should not attest at all. [Danny] A justified checkpoint should be fully validated if you are going to act on it. [Potuz] If an attacker can trigger optimistic mode then they only need two blocks in order to cause problems. Prysm may follow Lighthouse in keeping an "invalid head" in the fork choice. [PaulH] All the approaches seem reasonable, including Teku's. [AdrianS] Not a big fan of "invalid head" - difficult to reason about. But staying in optimistic mode seems reasonable. [PaulH] If a node does not distinguish between optimistic and invalid then all should be well. [Potuz] This does not work for Prysm as it removes all invalid blocks, so would remove an optimistic head. **Action: Mikhail to add more context to the PR.** ## MEV Boost There is an open issue around delaying MEV-Boost at the transition. [AlexS] [This PR](https://github.com/ethereum/builder-specs/pull/38) looks good to go unless there are any final objections. ### Liveness discussion: circuit breaker proposal [AlexS] [Sketch of a proposal](https://hackmd.io/@ralexstokes/BJn9N6Thc ) to address a case when a Relay does not release the signed block data. Suggestion, have a heuristic like "if 5 blocks in a row are missing", then suspend MEV-Boost and re-route block building to local execution clients. [Mikhail] A higher threshold criterion (e.g. 16) might be better - what's the downside of having a larger rather than a smaller value? [Dankrad] How about an exponentially increasing recovery period? [Danny] That's reasonable, but also want to keep things simple. Could also do threat modelling - how likely is it for an adversary to trigger the mechanism at will give a certain threshold? [Sean Anderson] This is implemented in Lighthouse, but is configurable by the user. This makes it harder to game. Clients could have diverse defaults. [Terence] Making it a client implementation detail is good. [AdrianS] Concerned about adding complexity in order to avoid a corner case. Experience is that this often causes other issues. [MartinHS] Everything should continue working if MEV-Boost goes offline. Based on that, some extra complexity is tolerable. A circuit breaker is a good idea. Random client-chosen values might be good. [Danny] A circuit breaker can be simple. Worried that with only 1 or 2 relays they have significant power over the network. [Danny] Is a percentage of missing blocks easier than an absolute number? [AdrianS] It helps if the criterion is something that can be derived from the current state. Looking across forks adds significant complexity. [Dankrad] Eventually we could have a gossip channel to which signed blinded blocks could be published. Then the malicious Relay withholding behaviour would be detectable. Not before the Merge, however. (See chat highlights below for more on this. Conversation begins to turn more philosophical - watch the recording for details.) [AlexS] There is some appetite for a circuit breaker. **Action: Alex will firm up a proposal**. Discussions to the `#block-construction` channel on the R&D Discord. [Danny] The MEV-Boost side-car design can help us to encapsulate mitigations. Other MEV topic: Lighthouse and Nimbus are close to merging MEV Boost spec PR. ## General discussion and AOB [Tim] There is an EIP-4844 call tomorrow. See the [PM repo](https://github.com/ethereum/pm/issues/581) for details. [Pari] The Goerli blog post is out. Update your nodes! * * * # Chat highlights From danny to Everyone 03:02 PM : https://github.com/ethereum/pm/issues/574 From Chris Hager to Everyone 03:05 PM : https://boost.flashbots.net From Potuz to Everyone 03:11 PM : https://github.com/ledgerwatch/erigon/pull/4812 it's merged From Mario Vega to Everyone 03:14 PM : I am working on exactly this scenario in hive, it would be nice to have the expected behavior in spec too From Justin Florentine to Everyone 03:16 PM : i believe spec says to keep gossip up till 2 finalized epochs From Marek Moraczyński to Everyone 03:18 PM : Mario I wrote some tests in Nethermind for gossip: https://github.com/NethermindEth/nethermind/pull/4327 it could be useful for you From danny to Everyone 03:20 PM : TTD_NOT_SET = big_value From Mikhail Kalinin to Everyone 03:20 PM : `2**256 - 2**10` From danny to Everyone 03:20 PM : `2^256 - 2^10` `**` From Marek Moraczyński to Everyone 03:20 PM : https://github.com/ethereum/consensus-specs/blob/981b05afb01d5b19be3a5a60ccb12c3582e4c0cf/configs/mainnet.yaml#L16 From Mikhail Kalinin to Everyone 03:22 PM : https://github.com/ethereum/consensus-specs/pull/2955 From Potuz to Everyone 03:29 PM : the justified checkpoint may be completely VALID in this situation and still need a reversal From danny to Everyone 03:37 PM : I was misunderstanding the scope of the problem. understood now and will think on it From Parithosh Jayanthi to Everyone 03:39 PM : Update nodes for goerli! From stokes to Everyone 03:39 PM : https://github.com/ethereum/builder-specs/pull/38 https://hackmd.io/@ralexstokes/BJn9N6Thc From stokes to Everyone 03:47 PM : default is a random value From Micah Zoltu to Everyone 03:47 PM : Majority clients have a longer delay than minority clients. From danny to Everyone 03:48 PM : still the upper bound is how you game it upper bound on the defaults From stokes to Everyone 03:48 PM : yeah if any value is known it will be gamed From Micah Zoltu to Everyone 03:48 PM : I concur with Adrian. From stokes to Everyone 03:48 PM : which kind of suggests uniformity From Micah Zoltu to Everyone 03:51 PM : I feel like I'm missing something. Why would MEV boost going down result in an outage? From danny to Everyone 03:51 PM : not going down From Micah Zoltu to Everyone 03:51 PM : I thought the clients *always* built a fallback block? From danny to Everyone 03:51 PM : if you already comit to the blinded blokck From stokes to Everyone 03:52 PM : proposers would attempt to use it but then rugged by mev-boost network From danny to Everyone 03:52 PM : you can't fallback locally anymore From Micah Zoltu to Everyone 03:52 PM : Ah. From danny to Everyone 03:52 PM : so it's a malicious relay or a very very particular bug gas limit elasticity From Chris Hager to Everyone 03:54 PM : That kind of reputation and monitoring is basically the idea of the relay monitor: https://github.com/flashbots/mev-boost/issues/142 From Paul Hauner to Everyone 03:57 PM : It’s hard From Chris Hager to Everyone 03:57 PM : the relay monitor 👆would also know about withholdings, but would be a trusted entity From danny to Everyone 04:00 PM : more than 1/3 blocks missing over 32 or 64 blocks From stokes to Everyone 04:01 PM : do we count orphans? From danny to Everyone 04:01 PM : or 40% -- at lesat then you still have block elasticity to help with the gas limit From stokes to Everyone 04:01 PM : otherwise a reorging attacker can corrupt things From Chris Hager to Everyone 04:01 PM : mev-boost could disconnect from penalized relays, wouldn't need to be a BN change From Micah Zoltu to Everyone 04:02 PM : I didn't realize proposers and relay wasn't bonded in The Merge version. 😢 From stokes to Everyone 04:02 PM : won't really have in-protocol bonding till in-proto PBS From Micah Zoltu to Everyone 04:02 PM : I thought it was, which meant that it had some amount of sybil resistance and incentives to defend against this. From stokes to Everyone 04:03 PM : they can opt-in to be bonded, but it wouldn't be enshrined in any sense e.g. DA commitee for relays From Micah Zoltu to Everyone 04:03 PM : We would need them to be bonded in a way that can be slashed for it to be effective I think.