Reducing Prysm Proposer Slot 0 propose time by 800ms

# Reducing Prysm Proposer Slot 0 propose time by 800ms ## The background Every Ethereum slot is 12s, and into the 4s mark is the attestation cut-off. A newly proposed block is highly subject to getting reorg if it's not seen before the 4s cut-off due to a lack of attestation votes. Attesters vote on the head at the 4s mark. For the block proposal, a validator client does the following: 1. Validator client calls `GetBlock` RPC end point 2. Beacon node builds and returns the block 3. Validator signs the block and calls `ProposeBlock` RPC end point 4. Beacon node broadcasts the block to the rest of the network (1) should happen at the **zero second** of the slot, (2) is the critical path as the beacon node builds the block by packing consensus objects (ie. attestations, deposits, exits... etc) and getting execution payload from either local EL client or highest bid builder using mev-boost. We have seen from the 90th percentile, (2) takes one to two seconds to complete. Rest (3) and (4) should be fast. Without mev-boost, a validator should see block broadcasts under 1s. With mev-boost, a validator should see block broadcasts between 1-2s depending on your network latency. ## The discovery Two weeks ago, I wanted to improve validator logging by adding **when** it begins and ends building blocks. Think of the start time and end time for (2). Pull request: https://github.com/prysmaticlabs/prysm/pull/12452/ Example: ``` {"message":"Begin building block","prefix":"rpc/validator","severity":"INFO","sinceSlotStartTime":107853265,"slot":5792614} {"message":"Finished building block","prefix":"rpc/validator","severity":"INFO","sinceSlotStartTime":297280711,"slot":5792614,"validator":54335} ``` From the logs above, the validator **begins** building block 107ms into the slot and **ends** 297ms into the slot. As we said earlier, a validator should call `GetBlock` at the start of the slot. In the ideal world, `Begin building block`'s `sinceSlotStartTime` should be as close to 0 as possible. But why is it not 0? Could be local RPC latency of the beacon node and the validator client, or it could be validator is performing "other tasks" beforehand. After reading the code, I discovered validator client checks exit status and **current epoch assignment** before calling `GetBlock` It quickly dawns on me that getting current epoch assignment is not a cheap call, and this will significantly affect slot 0 proposer performance. ## The bottleneck Let's take a step back. The validator client is just a dumb signer. It requests blocks and attestations from the beacon node to sign. It's not aware of when it needs to request, so it needs to call `GetDuties` to get its attester slot (once every epoch) and proposer slot. The proposer slot is safely known at the start of epoch slot 0. With a caveat, the proposer slot of the next epoch is **semi-safely** known at the start of the current epoch as well. Semi-safely because the proposer slot could change at the start of the epoch due to the proposer balance falling below the threshold due to slashing. Here is the [spec definition](https://github.com/ethereum/consensus-specs/blob/dev/specs/phase0/beacon-chain.md#compute_proposer_index) ```python= effective_balance = state.validators[candidate_index].effective_balance if effective_balance * MAX_RANDOM_BYTE >= MAX_EFFECTIVE_BALANCE * random_byte: return candidate_index ``` Now you may ask, why do we care about the next epoch proposer? We care about the next epoch proposer because of using `ForkChoiceUpdate` + `PayloadAttribute` in the Engine API to signal intent to EL client for local block construction. This must be done at slot 31 rather than slot 0 to ensure enough time to construct a profitable execution block. Here is the [spec definition](https://github.com/ethereum/execution-apis/blob/main/src/engine/paris.md#engine_forkchoiceupdatedv1) Another thing to note is after processing a block at slot 31, client implementations have **look ahead optimization to advance the slot to the next epoch** to avoid spending time when the slot 0 block arrives. Advance slot to the next epoch involves precomputing shuffling cache for the attester committee and proposer. This can easily save up to 500ms when the slot 0 blocks arrive. Now here is what Prysm missed: - At the end of epoch 1, the Prysm beacon node caches epoch 2's shuffling result - Prysm beacon node does not attempt to cache epoch 3's shuffling result - Prysm validator calls `GetDuty`, which returns epochs 2 and 3's shuffling result. Epoch 2 is warm in the cache, but epoch 3 is cold in the cache. `GetDuty` will compute epoch 3's shuffling on the fly, which adds additional 500ms latency to complete the call That's the reason why a Prysm validator at slot 0 will call `GetBlock` 500ms late into the slot, which eats in the 4s attestation cutoff. ## The fix After exploring many solutions, including extending shuffling cache to two epoch worth of data, I've settled on the [simplest solution](https://github.com/prysmaticlabs/prysm/pull/12484) for now. The solution is to call `GetProposerIndex` on a beacon state slot set to epoch 3 after caching epoch 2's shuffling result. This will warm the shuffling cache for epoch 3, so for `GetDuty` call at epoch 2 slot 0, it'll be fast, and we verified it only took 50ms. That's a reduction from 500ms to 50ms. Note there is still inefficiency, such as slot 31 being missed, which will be addressed in the subsequent PR.

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.