Try   HackMD

Aestus Timing Games as a Service

Auston Sterling
Thank you to Max/KuDeTa and Ladislaus for review

In this document we describe the Aestus Relay's approach to MEV-Boost timing games as a service (TGaaS). In short:

  • Block proposal timing games are unavoidable; at this point the best approach is ensuring democratized access to high-quality timing management tools.
  • Aestus will apply a safe delay to all getHeader requests coming from validators identified by user agent. See below.
  • Aestus's default timing games implementation results in a median delay of 735 ms.
  • Validators looking to be more conservative or more aggressive may customize parameters by appending ?headerDelay={ms}&headerCutoff={ms} to the Aestus listing in their mev-boost relay list. See below.
  • We encourage staking pools and relays to be transparent about timing games.

Background

We recommend that readers familiarize themselves with MEV-Boost and the concept of timing games before diving in, in particular:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

PDF of annual MEV increase in Chorus One's Adagaio pilot when using timing game strategies. Median of 4.75% increase. Since timing games are zero-sum, this is potential value coming out of the pockets of honest validators, straight into those of actors with sufficent resources to develop and optimize their timing game strategy.
Source: The cost of artificial latency in the PBS context

Timing Games as a Service

Timing games have seen continued adoption by validator node operators. However, relays have the potential to influence timing games on a larger scale. Social coordination between relays to agree on an early cutoff to getHeader calls would force validators onto level ground, but such coordination would be an unstable equilibrium and appears unlikely. Instead, while relays have no control over when the proposer calls the getHeader API endpoint, a relay could choose to deliberately delay the top bid lookup and response to the client. The relay essentially provides timing game functionality as a service to connected validators (TGaaS).

Relay-Side Timing Games

In fact, we believe that relays are the best place to implement timing games. One of the main concerns with timing games is that they widen the APR gap between sophisticated and unsophisticated validator node operators, creating a centralizing pressure on the validator set. To combat this, home/solo stakers need access to simple tools that can close most of that gap. MEV-Boost relays are perfectly positioned to develop TGaaS systems that democratize access to timing games in the same way that MEV-Boost originally democratized access to MEV as a whole. Additionally, relays can implement TGaaS as more of a one-way "push" of the top bid from relay to proposer, cutting latency in half.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Around the beginning of 2024, shifts in winning bid arrival times could be seen on a per-relay basis. BloXroute's winning bids have a clear shift later into the slot. A similar shift can be observed with ultrasound's distribution, though they have not described their strategy.
Source: Latency is Money: Timing Games /acc, by Data Always

The BloXroute relay is openly engaging in timing games. BloXroute has advertised this as a component of their full-featured Validator Gateway offering. However, as a paid service it presents a barrier to entry. Other relays may be experimenting with timing games, but so far none have detailed their strategies publicly. As relays continue to develop their plans, we encourage transparency and communication so stakers can make informed decisions.

A few relays' adoption of timing games forces the rest to also play along in order to remain competitive. Most stakers connect to multiple relays, and the proposer's mev-boost client will wait for all relays to respond (up to 950 ms). So if one relay delays while another does not, the delay to block proposal is already done, but the bid from the relay that did not delay may be nearly a full second old. The older bid may be less valuable, and may be a bid that the builder has since cancelled. The risk of delivering stale, cancelled bids may lead builders to only send their blocks to relays with sufficently aggressive timing games or other solutions.

Effect on Block Proposers

Timing games are zero-sum. With bid value roughly increasing linearly over time, the additional value a proposer gains by delaying their getHeader query by 1000 ms comes from the pockets of the following slot's proposer. The current landscape, in which some proposers and relays play heavy timing games while others have limited access to even basic latency tuning, creates a high discrepancy between proposers.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

A more recent depiction of the progression of timing games. The 2 second gap between solo stakers and Kiln distributions suggest that solo stakers' blocks contain as low as ~10s worth of value while Kiln's blocks may contain up to ~14s worth.
Source: On Attestations, Block Propagation, and Timing Games by Toni Wahrstätter

The "tg/acc" perspective is that timing games are inevitable: there will be actors who push latency optimization to the extreme, and no social consensus will be sufficient to rein them in. The best thing Aestus can do is to help raise the baseline, ensuring that everyone has access to high-quality, albeit not perfectly optimized, getHeader timing controls.

Aestus Implementation

Our primary goal is to reinforce the level playing field between validators. Everyone from home stakers to Lido operators to centralized staking providers should have equal access to timing games. In implementing this system we also aim to optimize for transparency, validator control, and Ethereum consensus safety.

Timing Parameters

The Aestus Relay considers two parameters to determine the delay for a getHeader API call: headerDelay and headerCutoff. Aestus provides default values for each, but allows client customization. Simply put, for a call received t ms past the start of the slot:

delay = min(headerDelay, headerCutoff - t) - estimatedNetworkLatency

headerDelay

headerDelay is the length of time after the initation of the request that the relay will wait (in ms) before returning the best bid to the caller. A longer delay gives builders more time to send valuable bids to the relay. Aestus uses multiple data sources to estimate when the client initiated the request, and includes estimated network latency in its delay calculations. A headerDelay of 500 ms would instruct Aestus to attempt to time its response so that the client receives it exactly 500 ms after the call was initated.

This delay is primarily limited by the timeouts enforced by consensus clients (1000 ms) and the mev-boost client (950 ms).

headerCutoff

headerCutoff is an offset relative to the start of the proposer's slot, after which any delay will end and the top bid returned. Aestus additionally enforces a hard limit of 3000 ms (standard across relays), after which any new calls will fail and no bids will be returned. By setting an earlier headerCutoff, proposers may take advantage of timing games when block production is proceeding on time, but receive an immediate response when block production is running late. For a call received t ms into the slot:

  • t < headerCutoff: headerDelay will apply, bounded by headerCutoff.
  • headerCutoff < t < 3000 ms: No delay, top bid immediately returned.
  • 3000 ms < t: Too late, Aestus will not return a bid.

Default Timing Parameters

Aestus refers to the client's user agent to determine default parameters. User agents containing mev-boost, Vouch, or commit-boost are assumed to be validators participating in block production, and are given a delay by default. All other user agents are assumed to be relay monitors, builders, or other parties interested in tracking the current top bid, and are not assigned any delay by default.

Other software may request the default delay parameters by including the string delay anywhere in the their user agent, or may request a custom delay by providing a ?headerDelay= parameter.

This design is opt-out, in order to provide improved bids for stakers who connect without being aware of advanced configuration options. Since most stakers will be connected to multiple relays, some with timing games of their own, this also ensures that Aestus's bids are competitive. Builders may also benefit from the reduced risk of a proposer accepting a cancelled bid.

  • headerDelay (validator client): 800 ms
  • headerDelay (other client): 0 ms
  • headerCutoff (all clients): 2000 ms
  • No bids delivered past: 3000 ms (mev-boost-relay default)
  • Internal latency estimation parameters set conservatively

Configuration

Validators can provide custom headerDelay and headerCutoff parameters through their mev-boost relay list. This functionality works with standard mev-boost clients, with no need for custom builds.

Append URL query parameters at the end of your Aestus mev-boost relay entry as follows. Values are in milliseconds, provide the number alone without decimal places or units.

-relay https://0xa15b52576bcbf1072f4a011c0f99f9fb6c66f3e1ff321f11f461d15e31b1cb359caa092c71bbded0bae5b5ea401aab7e@aestus.live?headerDelay={delayMs}&headerCutoff={cutoffMs}

An example could read:

?headerDelay=200&headerCutoff=2000

Parameters take precedence over user agent: a mev-boost user agent with ?headerDelay=0 would have timing games disabled, while an empty user agent with ?headerAgent=200 would receive responses after 200 ms.

When customizing these parameters, remember:

  • Your mev-boost client will time out getHeader calls after 950 ms, and there is some variability in network latency. If you set a headerDelay of 800 ms and a networking issue causes your request to be delayed by >150 ms, mev-boost will time out the request and you will not receive a bid!
  • A high headerCutoff could increase the risk of delaying block production too much. Timing games may increase the missed/wrong head vote rate of attesters voting on your block, and aggressive games may cause reorgs. The graph below shows how delaying bids too much can increase the risk of a forked/reorged block. Be aware: your personal probability will not match this 1:1.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Note that "winning bid arrival times" are different than getHeader request times, but suffice to show the general trend. Source: Latency is Money: Timing Games /acc, by Data Always

Results

Aestus has been experimenting with timing games throughout the year and iterated on designs, starting with simple and extremely conservative setups and moving to an advanced and more aggressive model in July.

Since July, the relay has been delaying with the goal of the proposer receiving a bid 800 ms after their request was initiated. The actual duration of the relay-side delay will be less than 800 ms if the estimated latency to the proposer is high, or if the cutoff is reached. So far, no proposers have used custom timing parameters outside of testing.

In a sample of recent delayed getHeader requests, Aestus had a mean relay-side delay of 630 ms, and a median of 735 ms. Most proposers make their request early with minimum latency, and receive nearly the maximum amount of delay to wait for better bids. The system adjusts for proposers with higher latency, as seen in the long tail.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Histogram of relay-side delay times from a sample of recent delayed getHeader calls, with a maximum of 800 ms delay. High estimated latency or a late getHeader request will cause the relay to delay for a shorter period of time so that the proposer receives their bid at the target time.

The practical effect of delays can be observed in the shift in distribution between the times getHeader requests are received, and the time when the delay is completed and the top bid is finally queried to be returned to the proposer. These are expressed as the time since the slot boundary, or "time into the slot".

In a sample of recent delayed requests, the request received times had a mean of 403 ms and a median of 291 ms into the slot. After applying delays, the top bid query times had a mean of 1040 ms and a median of 975 ms into the slot. This shift helps close the gap between casual users of the relay and more sophisticated entities.

image

Overlapping histograms of time into slot, with green showing the distribution of times when a getHeader request was received, and yellow showing the distribution of times when, after a delay is complete, the top bid is queried and returned to the proposer. The shift in distributions reflects the effect of delays. No bids are delayed past 2000 ms.

From the perspective of network consensus health, we see most top bid queries landing between 700-1300 ms into the slot. The winning bid must have arrived before that point, and so cross-referencing with Data Always' fork probability graph above we observe a negligible increase in fork chance with this degree of delay. In order for relay-side timing games to become dangerous to network health, proposers would first have to delay their requests more than they are currently; the 950 ms timeout constrains relay aggressiveness.

It is difficult to estimate how frequently we delay too long and the proposer's request times out, since in most slots multiple relays deliver the winning payload and it can be unclear which (one or multiple) relays actually delivered that winning bid. Work on quantifying this is ongoing.

Conclusion

Aestus's implementation of timing games as a service makes use of the 950 ms timeout on getHeader requests by targeting a return time of 800 ms, accounting for network latency to ensure bids are received on time. Proposers receive a median of 735 ms delay with no particular sophistication needed on their part. This helps close the timing games gap with sophisticated proposers and other relays.

Our default cutoff of 2000 ms into the slot, and modest overall bid query time distribution so far (especially compared to proposer-side timing games), suggest minimal impact on consensus health. However, we offer customizable parameters for proposers confident in their node setup to be more aggressiveor for others to be more conservativewhile retaining the benefits and safety measures built into relay-side timing games.

We hope that, if timing games are truly an inevitability, that Aestus can provide the best balance of increased value and safety for proposers and the Ethereum network.

Looking ahead, we would like to continue development of our timing games system to for higher safety, and higher proposer rewards where possible. We see room for collaboration between relays on standardization of some aspects timing times and proposer-controlled parameters

Finally, we would like to encourage further development of protocol-level changes that would shift the burden of sophistication further from proposers (e.g. APS, rainbow staking).