We collect some measurements made on the pyrmont and mainnet ETH 2.0 network related to delays in block production/propagation at epoch transition.
Validators that are assigned the first slot of an epoch to attest are 20% likely to vote incorrectly on it. On each epoch a validator has 1/32
chances of attesting on this epoch, and a bad vote penalizes the validator for both head
and target
in this case. This accounts for at least (assuming perfect inclusion distance and a good vote on source)
of the validator rewards.
The approach is straightforward and minimal. For each block we receive we collect the slot
, the delay
, the graffiti
, and the attestation data. Here delay
is measured astimestamp - genesis_time - 12*slot
where timestamp
is the time at which we have received the block. On Pyrmont, the graffiti allows us to identify the client node without much error since dev teams and the EF are currently running over 90% of validators.
Attestation data is collected as follows. We let TargetRoot
to be the root of the block we got in slot=N % 32
and HeadRoot
be the root of the block we got in slot=N-1
. We declare the attestation data on target to be correct if it votes on TargetRoot
and the attestation data on head to be correct if it votes on HeadRoot
(ie. we assume that the monitoring node is in the canonical chain). In this particular case we have been on the canonical chain for target during the whole measurement so no manual intervention was necessary. Assuming the wrong HeadRoot
for a few slots will not skew the aggregated data.
The last systematic error we are making is that timestamp
is measured after the monitoring node (prysm in this case) has synced the block. This was benchmarked by Nisdas at prysmaticlabs to be less 15 ms for slots other than the first two slots, while less than 100ms and 60ms for slots 0 and 1 respectively [1].
In this graph about block propagation delay in Pyrmont we can see immediately the problem. The scale is logarithmic and each point represents a received block, proposed by the color-coded client in the given slot (we count slots relative to the beginning of the epoch). We see that most blocks on the first slot are arriving over two seconds from the start of the slot. We can immediatly pin-point some client problems/qualities:
One issue with the above graph is that this is the point of view of a single node in the network. To discard a skew on block arrival time, we look at the number of wrong voted blocks. We expect a strong correlation between the delay at which the block arrived and the number of wrong votes for head.
In the following graph, each point represents a block. The color encodes the client. The x-scale is logarithmic and is the delay in miliseconds.
We confirm that Lighthouse's blocks are concentrated in a small timeframe, Tekku's are arriving consistently earlier, and prysm's are all over the place. While it is true that most fast arriving blocks are correctly voted. We see that there is a large concentration of fully bad voted blocks at all latencies. This can be caused by short-lived forks. In fact, we can check the same metric on mainnet and we see that the vote generally is much better (all points are at the bottom) but we also see a small stripe on top with essentially all votes being wrong
We can also correlate wrong head vote with the position of the slot, here is the data of pyrmont
We confirm our thesis about prysm nodes on slot 0: prysm proposed blocks generate over 95% of wrong votes in slot 0. Besides this fact, we see that most clients are comparable by this measure (someone might want to point out Lighthouse performance on slot 0 which is compatible with our observation above that Lighthouse performs as Teku on this slot)
The same measure on mainnet is much better, although it shows a higher correlation between the first few slots. Warning: this could be due to lack of enough points on mainnet. This document will be updated.
In the graphs above I collected the percentage of all votes, summed up on every congruence class for a slot. This is because this is the measure that is mostly relevant from the validator perspective since it's how likely you are to vote wrong and be penalized. If we look however for the percentage of blocks which were badly voted (namely more than 50% of bad votes) we see the following on mainnet
The contrast of this graph with the previous one teaches us something: we are getting some blocks on slot 1 (the second slot) with lots of bad votes. But most blocks on slots >0 are correctly voted.
Regardless of what happens on slot 0, we would expect good metrics on target votes after a few slots. Indeed we see that we are much more likely to vote wrong for target on the first slot:
We are still seeing 20% of wrong target votes after half an epoch. In fact, we see that there are blocks produced with bad votes at every slot.
On mainnet again these metrics are much better:
If we sum over all blocks in the epoch on pyrmont we see that no particular client is producing blocks with worse voting than the others:
Prysm gets less total number of wrong votes, but the same voting, meaning that prysm-produced blocks seem to be less voted in pyrmont, at least during this measurement. Warning During part of this measurement, EF prysm nodes were down.
No conclusions yet, will eventually write up something when I gather enough data on mainnet.
[1] https://github.com/prysmaticlabs/prysm/tree/benchmarkGossip