Recently, there's been a lot of buzz about the validator timing game. In this post, I want to delve into a different twist of this topic, exploring the interplay between consensus client and execution client with Engine API. I'll walk you through various scenarios, showing how this game unfolds and why sometimes your validator might miss out on an attestation head vote, or in more extreme cases, even a block. It's important to note that what I'm describing here is specifically tied to the Prysm Consensus Layer Client. This might not hold true for other clients, but hey, that's the beauty of client diversity!
First off, let's dive into how the CL (consensus layer) and EL (execution layer) work together after the merge. Your CL client has a few key tasks for the EL client, plus an extra request if it's proposing a block in the next slot. Here's what it boils down to:
Now, here's the key part: if the EL comes back with an error at any point in this process, the block is a no-go. It doesn't matter what the consensus part of the block says - if the EL isn’t happy, it's game over for that block.
Secondly, we will examine the operations conducted independently by a typical CL client, without interactions with the EL client. These operations include:
Regarding the computationally and latency intensive part of these processes, they are:
Tasks 1 through 3 are mandatory for each slot, while task 4 is required once at the end of each epoch. Clients typically employ optimizations to perform these resource-intensive tasks during minimal contention, generally in the latter half of the slot. This scheduling is strategic, blocks typically arrive in the first half of the slot. Therefore, the state required for the subsequent slot is precomputed towards the end of the current slot, post-block processing. This approach ensures prompt processing of incoming blocks and encapsulates the essence of the "update some caches" operation.
Now that we've got the basics down, let's look at how the Prysm CL client handles different situations based on when the block is processed and whether the block is head.
Let's explore the happy scenario where the block is validated by consensus, execution, and data availability (in Deneb). The fork choice successfully updating the head and extending the local fork choice view with the new block and its attestations. After validations and update fork choice, the block is still on time (within 4s) and Prysm CL client is not lined up to propose the next slot, it will send an FCU to the execution layer client. After sending FCU, Prysm updates its caches for the upcoming slot.
What happens when a Prysm validator is set to propose a block next slot in this happy case scenario? After updating the cache for the next slot, the node sends a second FCU + payload attribute. You might wonder why two FCUs are sent? One with an attribute and the other without. The rationale is straightforward. The first FCU is tailored to enhance attester duty, providing attesters with the most current head information as quickly as possible. However, to determine a payload attribute for the proposer, it's necessary to compute the cache for the next slot. This cache computation is mandatory for proposers, but irrelevant to attesters. Therefore, the first FCU focuses on optimizing attester duty, while the subsequent FCU + attribute, fulfills the requirements of the proposer duty.
What if the block is processed on time, yet the head block differs from the incoming block? We refrain from updating the cache for the subsequent slot in such instances. Updating the cache becomes redundant since any activity in the next slot won't utilize a non-canonical post state from the node's perspective. The Prysm CL client incorporates a separate background routine that performs cache updates at the 4s mark under the condition that the head slot diverges from the current slot. In this scenario, block is not head and block never arrives is the same. the Additionally, if the current slot block has never arrived then the Prysm CL client will send an FCU + attribute for the proposer if it's proposing the next slot.
Lastly, consider the situation where the block is processed late. In such a case, there's no necessity for optimization or dividing the process into two separate FCU calls. We could utilize a single FCU + attribute if there's a proposer for the next slot. Conversely, we'd use the FCU without the attribute if there's no proposer. This approach is straightforward to reason. However, it's crucial to note that we won't invoke the FCU + attribute if it's not the new head. Given that a late block lacks a proposer boost, and certain client implementations may intentionally reorg such a late block, we will likely use the same FCU + attribute, as mentioned previously in the separate routine. The key takeaway here is that when the block is delayed, there is no necessity for two FCUs.
Having explored most of the scenarios, let me add few more points. In the context of Deneb, fulfilling data availability requirements is a prerequisite before a client can proceed with updating the head. This requirement will inevitably extend the process. It's a situation that continues to warrant close observation. Also, updating the cache in an epoch's last slot might take longer than other slots. This is primarily because there are additional processes to handle across the epoch boundary, such as shuffling. Some client implementations, such as Prysm and Lighthouse, will intentionally reorg late blocks. That sums it up! I hope this elucidates things clearly, offering a fresh perspective on how the consensus client functions and interacts with the execution client, all while fine-tuning for the validator client.