owned this note
owned this note
Published
Linked with GitHub
# Flashbots for Ethereum Consensus Clients
## Introduction
### Flashbots
[Flashbots](https://docs.flashbots.net/) need no introduction, but their primary focus is *"to enable a permissionless, transparent, and fair ecosystem for MEV extraction."*
As of November 2021, Flashbots [claim](https://docs.flashbots.net/flashbots-auction/miners/quick-start) that 80% of the Ethereum hashrates accepts Flashbots bundles and this increases the block reward by 0.3 ETH on average. The Flashbots team maintain [mev-geth](https://github.com/flashbots/mev-geth), a fork of geth modified to work with Flashbots.
Objectively, Flashbots is a major infrastructure-level Ethereum application. Subjectively, it may be strategic to make Flashbots as safe and as client-diverse as possible.
This document assumes that it is a good idea for *at least some* consensus clients to work together to provide out-of-the-box support for Flashbots and other proposer/builder separation schemes.
### Document Purpose
The intention of this document is to share information about the Flashbots architecture with consensus client teams and start a discussion about how Flashbots and other imminently-achievable proposer/builder separation schemes can work in multi-client, post-merge Ethereum.
### Contributors
The ideas in this document are not entirely my own, far from it. A majority of these concepts have come from the Flashbots team. Some have come from Ethereum Foundation researchers. Feel free to contact me if you'd like to have your name added to this document.
## The Flashbots Objective
The [Flashbots POS: Merge Architecture](https://hackmd.io/@flashbots/HJ3EbzVUY) does a good job at outlining the basics of Flashbots post-merge. That being said, I think there is more iteration to be done before a final design is found.
In summary, after the merge Flashbots expects to have the following actors:
- *Users/Searchers*: Maintains public and private tx pools, creates *bundles*.
- *Builders*: Receives bundles and signed transactions from users/searchers and produces *execution payloads*. Builders are trusted by users/searchers.
- *Relay*: Routes execution payloads between builders and validators. Relays are trused by both builders and validators.
- *Validators*: Receives *execution payload headers* from relays and produces signed blocks containing those execution payload headers. Validators trust relays.
The high-level objective of the Flashbots PoS architecture is to allow validators to outsource the construction of an `ExecutionPayload` to a distributed network of builders.
From a consensus client perspective, this can be achieved by adding an additional component to the CL+EL combination which communicates with the network of relays, builders and users/searchers. In the Flashbots architecture, this component is named *mev_boost*. The layout of these components is discussed [later](#Is-mev_boost-Middleware).
## Changes to Consensus Clients
Supporting a reasonably secure Flashbots implementation involves significant changes to consensus clients.
It should be understood that the intention of these changes isn't to enshrine Flashbots in the protocol. Rather, it's to provide some infrastructure to allow *schemes like Flashbots* to operate safely.
### Change 1: Blind Transaction Signing
The Flashbots system requires that a validator signs a block without knowing the transactions included in it. This prevents the validator from including a only subset of transactions, causing harm to other actors in the system (e.g., builders, searchers).
*Hold up, wait a minute..* Doesn't that mean the mev_boost can affect the liveness of the chain? Yes, it does. That's the space we're working in here. A long-lasting liveness failure can be mitigated to some degree via a reputation scheme discussed briefly later.
Blinding can be achieved by having the validator sign over a `BeaconBlock` that has the type `ExecutionPayloadHeader` for the `body.execution_payload` field (rather than `ExecutionPayload`).
A more generic blinding implementation involves the validator signing over a `BeaconBlockHeader`, which hides the entire `BeaconBlockBody`.
Without choosing either of the blinding methods, let's call the type of block that has hidden transactions a `BlindedBeaconBlock` and `SignedBlindedBeaconBlock`.
That means the flow of block production appears like this:
```sequence
participant vc
participant bn
participant mev_boost
Title: Block Production
vc->bn: GET blinded_beacon_block
bn->mev_boost: GET execution_payload_header
mev_boost->bn: execution_payload_header
bn->vc: blinded_beacon_block
vc->vc: sign (slashable)
vc->bn: signed_blinded_beacon_block
bn->mev_boost: signed_blinded_beacon_block
mev_boost->mev_boost: un-blind
mev_boost->bn: signed_beacon_block
```
### Change 2: Proposal Promises
In order to produce an `ExecutionPayload`, the builder needs to know the `head_block_hash`, `timestamp`, `random` and `feeRecipient`. All of those values can be assumed, except the `feeRecepient` which must be specified by the validator.
> Note: there are several competing methods regarding sharing the `feeRecipient` with builders. The method below shouldn't be considered canonical, other methods are discussed later in the document.
Therefore, the proposer for each slot must share the `feeRecipient` with all the builders sometime before the slot arrives. To prevent spam from non-validators, validators are expected to sign this message and share it with builders:
```python
class ProposalPromise:
slot: Slot
validator_index: ValidatorIndex
fee_recipient: Uint256
shuffling_decision_root: Hash256
```
This is a new message to be introduced to BN<>VC comms. Adding a new BLS signing domain for this method would be prudent.
The flow might look like this:
```sequence
participant vc
participant bn
participant mev_boost
Title: Block Production
vc->vc: new future proposal
vc->bn: proposal_promise
bn->mev_boost: proposal_promise
```
### Change 3: New Gossipsub Topics
In the previous changes, we talk of the BN sharing messages with mev_boost. We need to consider validator privacy.
Lets assume that mev_boost runs on the same host as the BN. Under the current flashbots architecture, mev_boost is going to call out to `relay.flashbots.net`, therefore making a direct link between the validator index and the IP address.
A method to mitigate this is be to introduce some new, optional gossipsub topics to BNs:
1. `blinded_beacon_block`
1. `proposal_promise`
This would help anonymise the communications. With these channels, we can define the entire flow of Flashbot block production as such:
#### Phase 1: Proposal Promise
1. VC learns that a validator will produce a block in an upcoming slot.
2. The VC sends a `ProposalPromise` to the BN for the upcoming slot.
3. The BN publishes the `ProposalPromise` on the `proposal_promise` topic.
1. Each relay runs their own BN and receives the `proposal_promise` via gossip.
*It is now unclear to `relay.flashbots.net` as to which BN initiated the `ProposalPromise`.*
#### Phase 2: Block Production
1. Due to a successful `ProposalPromise` broadcast, all builders understand that an `ExecutionPayloadHeader` will be required in slot `s`.
2. At the start of slot `s`, *all** BNs request an `ExecutionPayloadHeader` from mev_boost.
3. All* BNs discard the `ExecutionPayloadHeader`, except the BN that expects to produce a block at slot `s`.
4. The relevant BN produces a `BlindedBeaconBlock` and it is signed by a VC.
5. The BN publishes the `SignedBlindedBeaconBlock` on the `blinded_beacon_block` topic.
1. Each relay runs their own BN and receives the `SignedBlindedBeaconBlock` via gossip.
5. The relay is responsible for un-blinding the `SignedBlindedBeaconBlock` and publishing a `SignedBeaconBlock` on the gossip network via its own BN.
*It is now unclear to `relay.flashbots.net` as to which BN earnestly requested the `ExecutionPayloadHeader` and which BN produced the `SignedBlindedBeaconBlock`.*
> *: Perhaps this scheme of *all* BNs performing superfluous requests is excessive, but it's trying to provide privacy. Perhaps BNs can "flip a coin" to determine if they'll do a superfluous request.
## Discussion
### Generic Terminology
This document has used Flashbots-specific terminology. This is not ideal if we're trying to implement a *generic* proposer/builder scheme, without enshrining Flashbots specifically.
I've chosen Flashbots terminology since this idea nascent and I don't know of any other MEV teams involved in this effort. It seemed simpler to start with Flashbots and then move to a generic system afterwards.
### Simplifying `ProposalPromise`
Presently, `ProposalPromise` must be sent before a block proposal, it cannot be sent periodically. This adds an additional step before block production which may be a point of failure.
It may be prudent to replace `ProposalPromise` with:
```python
class SignedFeeRecipientUpdate:
epoch: Epoch
fee_recipient: Uint256
signature: Signature
```
Nodes could then gossip a `SignedFeeRecipientUpdate` message each epoch to ensure that all relays understand the correct `fee_recipient` for each validator.
Assuming all validators use this system, it would result in at least 480kpbs additional bandwidth per BN. Reducing the frequency of these messages would reduce bandwidth, but would increase lag between a validator coming online and and having its fee recipient known. Futher consideration is required on this front.
### Privacy vs. Simplicity
Change (3) involves adding new gossipsub topics. Arguably, this is the most complex of changes. The impact on bandwidth needs to be considered and inconsistent gossip validation conditions could lead to network splits that reach beyond "just" the Flashbots topics. Futhermore, the additional calls between mev_boost and the relays to generate "noise" increases total network bandwidth and consumes resources on the relays.
The motivation for adding the gossipsub topics (and the noise scheme) is to make it more difficult for a relay to associate mev_boost IP address with a validator index. Avoiding linking IP addresses to validators is a bit of an automatic reflex in the Beacon Chain, however it's important to consider *why* we do this.
Primarily, we avoid linking IP address to validator since it can pose a liveness risk to the chain. A malicious actor with a list of validator IP addresses could spam or eclipse those validators, restricting the production of blocks or attestations.
One could argue that mev_boost already holds a position in our threat model where it can affect the liveness of the chain. So, allowing it to link IP addresses to validators does not change the threat model significantly. Of course, the threat model is more complex than just this consideration and this argument alone isn't enough to warrant relays unfettered access to the Beacon Chain network topology.
Another mitigation to network privacy is for validators to always run mev_boost on a separate IP address. Then, relays are unaware of the public BN address. Although this works in theory, it requires users to run a more complex and expensive setup. Home stakers are most likely only going to have access to a single IP address.
It's clear to me that the privacy/simplicity trade-off here is complicated. To stimulate debate, I will outline the Flashbots workflow if we assume that a relay knowing the IP address of an mev_boost instance is acceptable:
#### Phase 1: Validator Discovery
First, we introduce the `SignedFeeRecipient`:
```python
class SignedFeeRecipient:
fee_recipient: Uint256
signature: Signature
```
Then, on startup and at some interval, each validator in the system performs:
1. Produces a `SignedFeeRecipient` message using its voting BLS keypair.
2. Sends the `SignedFeeRecipient` to the BN.
3. The BN forwards the `SignedFeeRecipient` message to mev_boost.
4. mev_boost forwards the `SignedFeeRecipient` to relay(s).
#### Phase 2: Block Production
1. At the start of the slot, the VC requests a `BlindedBeaconBlock` from the BN.
2. The BN requests an `ExecutionPayload` from mev_boost. Once the payload is recieved, the BN packages it into a `BlindedBeaconBlock` and returns it to the VC.
3. The VC signs the block, returning a `SignedBlindedBeaconBlock` to the BN.
4. The BN sends the `SignedBlindedBeaconBlock` to mev_boost, which returns an un-blinded `SignedBeaconBlock` to the BN.
5. The BN publishes the `SignedBeaconBlock` on the network.
### Gossip DoS Protection
When sending messages on gossip topics we need to consider DoS attacks, i.e., the gossip validation rules.
The `blinded_beacon_block` topic can be treated exactly the same as the `beacon_block` topic, therefore it's no worse than the current status-quo.
The addition of the `shuffling_decision_root` to the `ProposalPromise` means that we can also police the `proposal_promise` topic in a very similar way to the current `beacon_block` topic; only allow messages from validators which are scheduled to propose at that slot.
### Is mev_boost Middleware?
There has been discussion about making mev_boost a "middleware" that sits between a consensus client and an execution client. I can see the merits for this, but I am not convinced it is an ideal solution for these reasons:
1. Middleware may have issues with composability. If some future component (e.g., EL diversity middleware) *also* wants to be middleware, how will those two middleware interact?
2. Middleware assumes a separation between EL and CL with a HTTP API between. Whilst that seems likely for the time being, it's hard to say if things will stay that way.
3. Most importantly, using mev_boost as the execution client means it is responsible for both liveness *and* correctness.
If middleware looks like this:
```
------------------- ------------- --------------------
| Consensus Client | <--> | mev_boost | <--> | Execution Client |
------------------- ------------- --------------------
```
I propose something more like this:
```
------------------ -------------
| | <--> | mev_boost |
| | -------------
| Consensus Client |
| | --------------------
| | <--> | Execution Client |
------------------- --------------------
```
With the non-middleware solution, the consensus client can rely upon mev_boost to *produce* `ExectionPayloads` whilst relying on the execution client to *verify* `ExecutionPayloads`. This means that a malicious mev_boost can only halt the chain temporarily, but it cannot cause the consensus client to import (and finalize) invalid `ExecutionPayloads`.
Additionally, this gives the consensus client the ability to detect if an mev_boost-created block is deemed invalid and then break its relationship with it. This could allow the consensus client to break a liveness fault caused by mev_boost after the first invalid block is produced.
## Summary
This document does not outline a final system. Instead, it draws a broad outline around a set of desired features and discusses thier benefits, complexities and drawbacks. There are still open questions remaining, like the big question of validator/relay privacy. There are also countless small details remaining. However, I think it's clear that an achievable post-merge solution is feasible.
It seems likely that Flashbots is going to be a crucial component of post-merge Ethereum. It's clear to me that consensus client developers can play a big role in helping Flashbots ship a safe and stable product.
One of the core principles of the Beacon Chain is *client-diversity*. This means avoiding the scenario where a single code path takes down the chain. It appears that the most profitable path for post-merge validators will be to use Flashbots, so must use our foresight to prevent Flashbots becoming [the rock upon which all else rests](https://www.explainxkcd.com/wiki/index.php/2347:_Dependency).
I think the ideal solution to this would be for each consensus client to implement *Flashbots as a first-class citizen*. This allows the consensus client to perform two important duties:
1. Using a separate, non-Flashbots execution client for block *verification*. Without this, a Flashbots fault could result in a finalized Beacon Chain with a bad `ExecutionPayload`.
2. Allowing for fallback to a non-Flashbots execution client upon a liveness failure from Flashbots (e.g., invalid or missing payloads). Without this, a Flashbots fault could result in a Beacon Chain deadlock.
Those two changes are what I think are *fundamental* to maintaining diversity on the Beacon Chain and are within reach of consensus teams today. The additional features like new gossip topics and `fee_recipient` signatures have desirable properties, but I think they need to weighed against the merge timeline.
I hope this document can provide enough information to start an informed disccusion on the topics I've raised here.