Flashbots for Ethereum Consensus Clients

Introduction

Flashbots

Flashbots need no introduction, but their primary focus is "to enable a permissionless, transparent, and fair ecosystem for MEV extraction."

As of November 2021, Flashbots claim that 80% of the Ethereum hashrates accepts Flashbots bundles and this increases the block reward by 0.3 ETH on average. The Flashbots team maintain mev-geth, a fork of geth modified to work with Flashbots.

Objectively, Flashbots is a major infrastructure-level Ethereum application. Subjectively, it may be strategic to make Flashbots as safe and as client-diverse as possible.

This document assumes that it is a good idea for at least some consensus clients to work together to provide out-of-the-box support for Flashbots and other proposer/builder separation schemes.

Document Purpose

The intention of this document is to share information about the Flashbots architecture with consensus client teams and start a discussion about how Flashbots and other imminently-achievable proposer/builder separation schemes can work in multi-client, post-merge Ethereum.

Contributors

The ideas in this document are not entirely my own, far from it. A majority of these concepts have come from the Flashbots team. Some have come from Ethereum Foundation researchers. Feel free to contact me if you'd like to have your name added to this document.

The Flashbots Objective

The Flashbots POS: Merge Architecture does a good job at outlining the basics of Flashbots post-merge. That being said, I think there is more iteration to be done before a final design is found.

In summary, after the merge Flashbots expects to have the following actors:

Users/Searchers: Maintains public and private tx pools, creates bundles.
Builders: Receives bundles and signed transactions from users/searchers and produces execution payloads. Builders are trusted by users/searchers.
Relay: Routes execution payloads between builders and validators. Relays are trused by both builders and validators.
Validators: Receives execution payload headers from relays and produces signed blocks containing those execution payload headers. Validators trust relays.

The high-level objective of the Flashbots PoS architecture is to allow validators to outsource the construction of an ExecutionPayload to a distributed network of builders.

From a consensus client perspective, this can be achieved by adding an additional component to the CL+EL combination which communicates with the network of relays, builders and users/searchers. In the Flashbots architecture, this component is named mev_boost. The layout of these components is discussed later.

Changes to Consensus Clients

Supporting a reasonably secure Flashbots implementation involves significant changes to consensus clients.

It should be understood that the intention of these changes isn't to enshrine Flashbots in the protocol. Rather, it's to provide some infrastructure to allow schemes like Flashbots to operate safely.

The Flashbots system requires that a validator signs a block without knowing the transactions included in it. This prevents the validator from including a only subset of transactions, causing harm to other actors in the system (e.g., builders, searchers).

Hold up, wait a minute.. Doesn't that mean the mev_boost can affect the liveness of the chain? Yes, it does. That's the space we're working in here. A long-lasting liveness failure can be mitigated to some degree via a reputation scheme discussed briefly later.

Blinding can be achieved by having the validator sign over a BeaconBlock that has the type ExecutionPayloadHeader for the body.execution_payload field (rather than ExecutionPayload).

A more generic blinding implementation involves the validator signing over a BeaconBlockHeader, which hides the entire BeaconBlockBody.

Without choosing either of the blinding methods, let's call the type of block that has hidden transactions a BlindedBeaconBlock and SignedBlindedBeaconBlock.

That means the flow of block production appears like this:

Change 2: Proposal Promises

In order to produce an ExecutionPayload, the builder needs to know the head_block_hash, timestamp, random and feeRecipient. All of those values can be assumed, except the feeRecepient which must be specified by the validator.

Note: there are several competing methods regarding sharing the feeRecipient with builders. The method below shouldn't be considered canonical, other methods are discussed later in the document.

Therefore, the proposer for each slot must share the feeRecipient with all the builders sometime before the slot arrives. To prevent spam from non-validators, validators are expected to sign this message and share it with builders:

class ProposalPromise:
    slot: Slot
    validator_index: ValidatorIndex
    fee_recipient: Uint256
    shuffling_decision_root: Hash256

This is a new message to be introduced to BN<>VC comms. Adding a new BLS signing domain for this method would be prudent.

The flow might look like this:

Change 3: New Gossipsub Topics

In the previous changes, we talk of the BN sharing messages with mev_boost. We need to consider validator privacy.

Lets assume that mev_boost runs on the same host as the BN. Under the current flashbots architecture, mev_boost is going to call out to relay.flashbots.net, therefore making a direct link between the validator index and the IP address.

A method to mitigate this is be to introduce some new, optional gossipsub topics to BNs:

blinded_beacon_block
proposal_promise

This would help anonymise the communications. With these channels, we can define the entire flow of Flashbot block production as such:

Phase 1: Proposal Promise

VC learns that a validator will produce a block in an upcoming slot.
The VC sends a ProposalPromise to the BN for the upcoming slot.
The BN publishes the ProposalPromise on the proposal_promise topic.
Each relay runs their own BN and receives the proposal_promise via gossip.

It is now unclear to relay.flashbots.net as to which BN initiated the ProposalPromise.

Phase 2: Block Production

Due to a successful ProposalPromise broadcast, all builders understand that an ExecutionPayloadHeader will be required in slot s.
At the start of slot s, all* BNs request an ExecutionPayloadHeader from mev_boost.
All* BNs discard the ExecutionPayloadHeader, except the BN that expects to produce a block at slot s.
The relevant BN produces a BlindedBeaconBlock and it is signed by a VC.
The BN publishes the SignedBlindedBeaconBlock on the blinded_beacon_block topic.
Each relay runs their own BN and receives the SignedBlindedBeaconBlock via gossip.
The relay is responsible for un-blinding the SignedBlindedBeaconBlock and publishing a SignedBeaconBlock on the gossip network via its own BN.

It is now unclear to relay.flashbots.net as to which BN earnestly requested the ExecutionPayloadHeader and which BN produced the SignedBlindedBeaconBlock.

*: Perhaps this scheme of all BNs performing superfluous requests is excessive, but it's trying to provide privacy. Perhaps BNs can "flip a coin" to determine if they'll do a superfluous request.

Discussion

Generic Terminology

This document has used Flashbots-specific terminology. This is not ideal if we're trying to implement a generic proposer/builder scheme, without enshrining Flashbots specifically.

I've chosen Flashbots terminology since this idea nascent and I don't know of any other MEV teams involved in this effort. It seemed simpler to start with Flashbots and then move to a generic system afterwards.

Simplifying `ProposalPromise`

Presently, ProposalPromise must be sent before a block proposal, it cannot be sent periodically. This adds an additional step before block production which may be a point of failure.

It may be prudent to replace ProposalPromise with:

class SignedFeeRecipientUpdate:
    epoch: Epoch
    fee_recipient: Uint256
    signature: Signature

Nodes could then gossip a SignedFeeRecipientUpdate message each epoch to ensure that all relays understand the correct fee_recipient for each validator.

Assuming all validators use this system, it would result in at least 480kpbs additional bandwidth per BN. Reducing the frequency of these messages would reduce bandwidth, but would increase lag between a validator coming online and and having its fee recipient known. Futher consideration is required on this front.

Privacy vs. Simplicity

Change (3) involves adding new gossipsub topics. Arguably, this is the most complex of changes. The impact on bandwidth needs to be considered and inconsistent gossip validation conditions could lead to network splits that reach beyond "just" the Flashbots topics. Futhermore, the additional calls between mev_boost and the relays to generate "noise" increases total network bandwidth and consumes resources on the relays.

The motivation for adding the gossipsub topics (and the noise scheme) is to make it more difficult for a relay to associate mev_boost IP address with a validator index. Avoiding linking IP addresses to validators is a bit of an automatic reflex in the Beacon Chain, however it's important to consider why we do this.

Primarily, we avoid linking IP address to validator since it can pose a liveness risk to the chain. A malicious actor with a list of validator IP addresses could spam or eclipse those validators, restricting the production of blocks or attestations.

One could argue that mev_boost already holds a position in our threat model where it can affect the liveness of the chain. So, allowing it to link IP addresses to validators does not change the threat model significantly. Of course, the threat model is more complex than just this consideration and this argument alone isn't enough to warrant relays unfettered access to the Beacon Chain network topology.

Another mitigation to network privacy is for validators to always run mev_boost on a separate IP address. Then, relays are unaware of the public BN address. Although this works in theory, it requires users to run a more complex and expensive setup. Home stakers are most likely only going to have access to a single IP address.

It's clear to me that the privacy/simplicity trade-off here is complicated. To stimulate debate, I will outline the Flashbots workflow if we assume that a relay knowing the IP address of an mev_boost instance is acceptable:

Phase 1: Validator Discovery

First, we introduce the SignedFeeRecipient:

class SignedFeeRecipient:
    fee_recipient: Uint256
    signature: Signature

Then, on startup and at some interval, each validator in the system performs:

Produces a SignedFeeRecipient message using its voting BLS keypair.
Sends the SignedFeeRecipient to the BN.
The BN forwards the SignedFeeRecipient message to mev_boost.
mev_boost forwards the SignedFeeRecipient to relay(s).

Phase 2: Block Production

At the start of the slot, the VC requests a BlindedBeaconBlock from the BN.
The BN requests an ExecutionPayload from mev_boost. Once the payload is recieved, the BN packages it into a BlindedBeaconBlock and returns it to the VC.
The VC signs the block, returning a SignedBlindedBeaconBlock to the BN.
The BN sends the SignedBlindedBeaconBlock to mev_boost, which returns an un-blinded SignedBeaconBlock to the BN.
The BN publishes the SignedBeaconBlock on the network.

Gossip DoS Protection

When sending messages on gossip topics we need to consider DoS attacks, i.e., the gossip validation rules.

The blinded_beacon_block topic can be treated exactly the same as the beacon_block topic, therefore it's no worse than the current status-quo.

The addition of the shuffling_decision_root to the ProposalPromise means that we can also police the proposal_promise topic in a very similar way to the current beacon_block topic; only allow messages from validators which are scheduled to propose at that slot.

Is mev_boost Middleware?

There has been discussion about making mev_boost a "middleware" that sits between a consensus client and an execution client. I can see the merits for this, but I am not convinced it is an ideal solution for these reasons:

Middleware may have issues with composability. If some future component (e.g., EL diversity middleware) also wants to be middleware, how will those two middleware interact?
Middleware assumes a separation between EL and CL with a HTTP API between. Whilst that seems likely for the time being, it's hard to say if things will stay that way.
Most importantly, using mev_boost as the execution client means it is responsible for both liveness and correctness.

If middleware looks like this:

-------------------       -------------      --------------------
| Consensus Client | <--> | mev_boost | <--> | Execution Client | 
 -------------------      -------------      --------------------

I propose something more like this:

 ------------------        -------------
|                  | <-->  | mev_boost |
|                  |       -------------
| Consensus Client | 
|                  |       --------------------
|                  | <-->  | Execution Client | 
 -------------------       --------------------

With the non-middleware solution, the consensus client can rely upon mev_boost to produce ExectionPayloads whilst relying on the execution client to verify ExecutionPayloads. This means that a malicious mev_boost can only halt the chain temporarily, but it cannot cause the consensus client to import (and finalize) invalid ExecutionPayloads.

Additionally, this gives the consensus client the ability to detect if an mev_boost-created block is deemed invalid and then break its relationship with it. This could allow the consensus client to break a liveness fault caused by mev_boost after the first invalid block is produced.

Summary

This document does not outline a final system. Instead, it draws a broad outline around a set of desired features and discusses thier benefits, complexities and drawbacks. There are still open questions remaining, like the big question of validator/relay privacy. There are also countless small details remaining. However, I think it's clear that an achievable post-merge solution is feasible.

It seems likely that Flashbots is going to be a crucial component of post-merge Ethereum. It's clear to me that consensus client developers can play a big role in helping Flashbots ship a safe and stable product.

One of the core principles of the Beacon Chain is client-diversity. This means avoiding the scenario where a single code path takes down the chain. It appears that the most profitable path for post-merge validators will be to use Flashbots, so must use our foresight to prevent Flashbots becoming the rock upon which all else rests.

I think the ideal solution to this would be for each consensus client to implement Flashbots as a first-class citizen. This allows the consensus client to perform two important duties:

Using a separate, non-Flashbots execution client for block verification. Without this, a Flashbots fault could result in a finalized Beacon Chain with a bad ExecutionPayload.
Allowing for fallback to a non-Flashbots execution client upon a liveness failure from Flashbots (e.g., invalid or missing payloads). Without this, a Flashbots fault could result in a Beacon Chain deadlock.

Those two changes are what I think are fundamental to maintaining diversity on the Beacon Chain and are within reach of consensus teams today. The additional features like new gossip topics and fee_recipient signatures have desirable properties, but I think they need to weighed against the merge timeline.

I hope this document can provide enough information to start an informed disccusion on the topics I've raised here.

Flashbots for Ethereum Consensus Clients

Introduction

Flashbots

Document Purpose

Contributors

The Flashbots Objective

Changes to Consensus Clients

Change 1: Blind Transaction Signing

Change 2: Proposal Promises

Change 3: New Gossipsub Topics

Phase 1: Proposal Promise

Phase 2: Block Production

Discussion

Generic Terminology

Simplifying ProposalPromise

Privacy vs. Simplicity

Phase 1: Validator Discovery

Phase 2: Block Production

Gossip DoS Protection

Is mev_boost Middleware?

Summary

Simplifying `ProposalPromise`