# Kiln testnet block proposal failure Date: March 15, 2022 ## Incident Summary Shortly after total terminal difficulty has reached (~3/15 3:00 PM UTC). Prysm node proposed bad block with the following error: ``` {"error":"could not process block: could not verify new payload: could not validate execution payload from execution engine: could not validate block hash: ","message":"Could not handle p2p pubsub","prefix":"sync","severity":"ERROR","topic":"/eth2/e7acb210/beacon_block/ssz_snappy"} ``` A similar error was observed over the wire: ``` ERROR sync: Could not handle p2p pubsub error=could not validate block hash: could not validate execution payload from execution engine ``` ## Impact Data shows no beacon block came out of validators that were ran by EF and Prysmatic Labs. The client combos were `Prysm - Geth`, and `Prysm - Nethermind`. The missing blocks account for ~15-20% of the total blocks. There was no impact on attestation participation, although the valid blocks were more full to account for more attestations from the missing blocks. ## Root causes Prysm beacon node used incorrect endianness to marshal/unmarshal the `base_fee_per_gas` field in `execution_payload` object. Today, the execution layer uses big endianness, and the consensus layer uses little endianness. Since Prysm incorrectly unmarshals `execution_payload` back to original form, the execution layer client correctly rejected the deformed payload when calling `engine_newpayloadv1` endpoint by returning `INVALID_BLOCK_HASH`. ## Resolution The issue was identified by comparing the before and after unmarshalled `execution_payload`. At the same time, Mario Vega and MariusVanDerWijden also noticed a similar pattern: ![](https://i.imgur.com/22rfmXg.png) Upon discovery, the endianess bug was quickly patched. We then tested the patch on local and cluster setup before building the docker image and releasing it to everyone else. ## Detection After TTD has reached, Kiln testnet block explorer began reporting missing Prysm blocks, and the error logs in cluster nodes confirmed it. This issue did not show up in the previous devnets, as Marius pointed out, the base fee was 7 which is equal regardless of the endianness. It also did not show up in the unit tests because 7 was used as the input value. ## Action items | Action Item | Type | Owner | Relevant Link | | -------- | -------- | -------- | -------- | | Fix base fee per gas endianness | Code change | Terence | | | Test the fix in cluster | Testing | Terence | | | Update e2e test to include tx generator | Testing | Nishant | | | Add differential fuzzing for engine api round trip|Testing|Nishant|| | Release docker image | Release | Terence | https://gcr.io/prysmaticlabs/prysm/beacon-chain:kiln-3ea8b7 | | Post mortem | Documentation | Terence | | ## Lessons Learned **What went wrong** - Prysm Proposer was unable to propose blocks **Where we got lucky** - Client diversity. Even with Prysm proposers down, the chain was relatively healthy - Community support. People like Mario and Marius around to help with the debugging. (Thanks!) - Great tooling. Added a tool to enable Prysm beacon node to fake propose every slot for faster triage