Date: March 15, 2022
Shortly after total terminal difficulty has reached (~3/15 3:00 PM UTC). Prysm node proposed bad block with the following error:
{"error":"could not process block: could not verify new payload: could not validate execution payload from execution engine: could not validate block hash: ","message":"Could not handle p2p pubsub","prefix":"sync","severity":"ERROR","topic":"/eth2/e7acb210/beacon_block/ssz_snappy"}
A similar error was observed over the wire:
ERROR sync: Could not handle p2p pubsub error=could not validate block hash:
could not validate execution payload from execution engine
Data shows no beacon block came out of validators that were ran by EF and Prysmatic Labs. The client combos were Prysm - Geth
, and Prysm - Nethermind
. The missing blocks account for ~15-20% of the total blocks. There was no impact on attestation participation, although the valid blocks were more full to account for more attestations from the missing blocks.
Prysm beacon node used incorrect endianness to marshal/unmarshal the base_fee_per_gas
field in execution_payload
object. Today, the execution layer uses big endianness, and the consensus layer uses little endianness. Since Prysm incorrectly unmarshals execution_payload
back to original form, the execution layer client correctly rejected the deformed payload when calling engine_newpayloadv1
endpoint by returning INVALID_BLOCK_HASH
.
The issue was identified by comparing the before and after unmarshalled execution_payload
. At the same time, Mario Vega and MariusVanDerWijden also noticed a similar pattern:
After TTD has reached, Kiln testnet block explorer began reporting missing Prysm blocks, and the error logs in cluster nodes confirmed it. This issue did not show up in the previous devnets, as Marius pointed out, the base fee was 7 which is equal regardless of the endianness. It also did not show up in the unit tests because 7 was used as the input value.
Action Item | Type | Owner | Relevant Link |
---|---|---|---|
Fix base fee per gas endianness | Code change | Terence | |
Test the fix in cluster | Testing | Terence | |
Update e2e test to include tx generator | Testing | Nishant | |
Add differential fuzzing for engine api round trip | Testing | Nishant | |
Release docker image | Release | Terence | https://gcr.io/prysmaticlabs/prysm/beacon-chain:kiln-3ea8b7 |
Post mortem | Documentation | Terence |
What went wrong
Where we got lucky