Every Ethereum slot is 12s, and into the 4s mark is the attestation cut-off. A newly proposed block is highly subject to getting reorg if it's not seen before the 4s cut-off due to a lack of attestation votes. Attesters vote on the head at the 4s mark.
For the block proposal, a validator client does the following:
GetBlock
RPC end pointProposeBlock
RPC end point(1) should happen at the zero second of the slot, (2) is the critical path as the beacon node builds the block by packing consensus objects (ie. attestations, deposits, exits… etc) and getting execution payload from either local EL client or highest bid builder using mev-boost. We have seen from the 90th percentile, (2) takes one to two seconds to complete. Rest (3) and (4) should be fast. Without mev-boost, a validator should see block broadcasts under 1s. With mev-boost, a validator should see block broadcasts between 1-2s depending on your network latency.
Two weeks ago, I wanted to improve validator logging by adding when it begins and ends building blocks. Think of the start time and end time for (2).
Pull request: https://github.com/prysmaticlabs/prysm/pull/12452/
Example:
{"message":"Begin building block","prefix":"rpc/validator","severity":"INFO","sinceSlotStartTime":107853265,"slot":5792614}
{"message":"Finished building block","prefix":"rpc/validator","severity":"INFO","sinceSlotStartTime":297280711,"slot":5792614,"validator":54335}
From the logs above, the validator begins building block 107ms into the slot and ends 297ms into the slot.
As we said earlier, a validator should call GetBlock
at the start of the slot. In the ideal world, Begin building block
's sinceSlotStartTime
should be as close to 0 as possible. But why is it not 0? Could be local RPC latency of the beacon node and the validator client, or it could be validator is performing "other tasks" beforehand. After reading the code, I discovered validator client checks exit status and current epoch assignment before calling GetBlock
It quickly dawns on me that getting current epoch assignment is not a cheap call, and this will significantly affect slot 0 proposer performance.
Let's take a step back. The validator client is just a dumb signer. It requests blocks and attestations from the beacon node to sign. It's not aware of when it needs to request, so it needs to call GetDuties
to get its attester slot (once every epoch) and proposer slot. The proposer slot is safely known at the start of epoch slot 0. With a caveat, the proposer slot of the next epoch is semi-safely known at the start of the current epoch as well. Semi-safely because the proposer slot could change at the start of the epoch due to the proposer balance falling below the threshold due to slashing. Here is the spec definition
effective_balance = state.validators[candidate_index].effective_balance
if effective_balance * MAX_RANDOM_BYTE >= MAX_EFFECTIVE_BALANCE * random_byte:
return candidate_index
Now you may ask, why do we care about the next epoch proposer? We care about the next epoch proposer because of using ForkChoiceUpdate
+ PayloadAttribute
in the Engine API to signal intent to EL client for local block construction. This must be done at slot 31 rather than slot 0 to ensure enough time to construct a profitable execution block. Here is the spec definition
Another thing to note is after processing a block at slot 31, client implementations have look ahead optimization to advance the slot to the next epoch to avoid spending time when the slot 0 block arrives. Advance slot to the next epoch involves precomputing shuffling cache for the attester committee and proposer. This can easily save up to 500ms when the slot 0 blocks arrive.
Now here is what Prysm missed:
GetDuty
, which returns epochs 2 and 3's shuffling result. Epoch 2 is warm in the cache, but epoch 3 is cold in the cache. GetDuty
will compute epoch 3's shuffling on the fly, which adds additional 500ms latency to complete the callThat's the reason why a Prysm validator at slot 0 will call GetBlock
500ms late into the slot, which eats in the 4s attestation cutoff.
After exploring many solutions, including extending shuffling cache to two epoch worth of data, I've settled on the simplest solution for now. The solution is to call GetProposerIndex
on a beacon state slot set to epoch 3 after caching epoch 2's shuffling result. This will warm the shuffling cache for epoch 3, so for GetDuty
call at epoch 2 slot 0, it'll be fast, and we verified it only took 50ms. That's a reduction from 500ms to 50ms. Note there is still inefficiency, such as slot 31 being missed, which will be addressed in the subsequent PR.
or
or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up
Syntax | Example | Reference | |
---|---|---|---|
# Header | Header | 基本排版 | |
- Unordered List |
|
||
1. Ordered List |
|
||
- [ ] Todo List |
|
||
> Blockquote | Blockquote |
||
**Bold font** | Bold font | ||
*Italics font* | Italics font | ||
~~Strikethrough~~ | |||
19^th^ | 19th | ||
H~2~O | H2O | ||
++Inserted text++ | Inserted text | ||
==Marked text== | Marked text | ||
[link text](https:// "title") | Link | ||
 | Image | ||
`Code` | Code |
在筆記中貼入程式碼 | |
```javascript var i = 0; ``` |
|
||
:smile: | ![]() |
Emoji list | |
{%youtube youtube_id %} | Externals | ||
$L^aT_eX$ | LaTeX | ||
:::info This is a alert area. ::: |
This is a alert area. |
On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?
Please give us some advice and help us improve HackMD.
Syncing