owned this note
owned this note
Published
Linked with GitHub
# Teku+Besu Goerli Merge Issue
Merge block was beacon chain block 3639527 er(`0x4f361e3a144edf3bc08a9718f2efc70d6d68d1f07eb271b9bb73a17140473691`), execution block hash `0xb43d883d637e8434b3f664e0f8ba0d1914d4d9c01a46be1dfc5f1ecd7bd381fd` execution block height 7382819.5 Parent execution hash: 0xa1be33e115ca56b51b80f2b7070d8062c3959f0b4a4f5fa35e91135894f55d30
Problematic block:
execution hash: 2d0230d6a70a8ad017c0cb846fc931c8c8507a7677e48d5a2a4a208279ab3f4d
```
"number": "0x70a722", (7382818)
"hash": "0x2d0230d6a70a8ad017c0cb846fc931c8c8507a7677e48d5a2a4a208279ab3f4d",
"mixHash": "0x0000000000000000000000000000000000000000000000000000000000000000",
"parentHash": "0x0579e6ad4e5287d8750ca242be2d056d9cf6102b1120a27eecc8231d086f1648",
"nonce": "0x0000000000000000",
"sha3Uncles": "0x1dcc4de8dec75d7aab85b567b6ccd41ad312451b948a7413f0a142fd40d49347",
"logsBloom": "0x0c01480002434800425010000150030120084850a10408b00002200618400483240092000000c42600088800100a08080110200301021fa0080242240037100000925a19013002214801400904204000021010004504140000c4000000831000280420001203420144002a2081244808020c2000000010002001801082364070610101000c90120a680008008156201214b00fe00919010a8420066550408010122c0a081832260828c0710005040001260420001005001180630804e12008000031a00228000180c0a010008101c0040040800494a004000801300000086340001038000011101048009000250c80000001000280802c4c8041810002402482",
"transactionsRoot": "0x7259ca4a6802a1b37e61e28d4d678d2f71347efcdf6c3f00c44019264f4eb19c",
"stateRoot": "0x78e2c2ab4c8bc3879258af13b83e516a094bfa7b72740166809b7eed717b7076",
"receiptsRoot": "0x3bb310e5e9b722b115145dc4556e1380d2444ca311674ef4df83dc88fbdbc572",
"miner": "0x0000000000000000000000000000000000000000",
"difficulty": "0x1",
"totalDifficulty": "0xa4a470",
"extraData": "0x696e667572612e696f0000000000000000000000000000000000000000000000126647e19df95b23d0b8125b951c61c4da15a50aa05d30005d222e98ff1326024bb9e5df1c0ac845da69eab0d16d02e389eeabca20a4df68bc5a43e716b88fbb00",
"baseFeePerGas": "0x8",
"size": "0x12ddf",
"gasLimit": "0x1c9c380",
"gasUsed": "0x14cf5e6",
"timestamp": "0x62f45f1a",
```
So besu first saw a terminal block which wound up being a fork and Teku selected it as the terminal transition block.
```
{"@timestamp":"2022-08-11T01:45:12,914","level":"INFO","thread":"beaconchain-async-1","class":"teku-event-log","message":"Merge *** Terminal Block detected: 2d0230d6a70a8ad017c0cb846fc931c8c8507a7677e48d5a2a4a208279ab3f4d","throwable":""}
```
At this point Teku will send fCU with the terminal block it has selected as the chain head until it eventually imports the merge beacon block and will then switch to using the execution payload from that block.
Initially Besu gives a VALID response to Teku:
```
2022-08-11 01:45:16.449+00:00 | vert.x-worker-thread-0 | INFO | EngineForkchoiceUpdated | VALID for fork-choice-update: head: 0x2d0230d6a70a8ad017c0cb846fc931c8c8507a7677e48d5a2a4a208279ab3f4d, finalized: 0x0000000000000000000000000000000000000000000000000000000000000000, safeBlockHash: 0x0000000000000000000000000000000000000000000000000000000000000000
```
Then Teku would have received a merge beacon block from another node which picked a different terminal block. It gets a SYNCING response from besu and is unable to optimistically import the merge block:
```
{"@timestamp":"2022-08-11T01:45:24,558","level":"WARN","thread":"forkchoice-async-0","class":"BlockManager","message":"Unable to import block: Execution Client is still syncing","throwable":""}
```
**FIXME:** Teku should log the block root and execution payload hash of the block when it can't import because of a syncing response.
Despite the lack of logs, this must have been the block at slot 3639527. That contained execution payload `0xb43d883d637e8434b3f664e0f8ba0d1914d4d9c01a46be1dfc5f1ecd7bd381fd` (7382819) with the parent being our alternate terminal block `0xa1be33e115ca56b51b80f2b7070d8062c3959f0b4a4f5fa35e91135894f55d30`
Besu has not yet seen `0xa1be33e115ca56b51b80f2b7070d8062c3959f0b4a4f5fa35e91135894f55d30` so is correct in returning SYNCING. Backward sync appears to have worked except with some CliqueExtraData warnings:
```
2022-08-11 01:45:25.279+00:00 | nioEventLoopGroup-3-3 | INFO | BackwardSyncStep | Saved headers 7382818 -> 7382619 (head: 7382818)
2022-08-11 01:45:25.341+00:00 | ForkJoinPool.commonPool-worker-12 | INFO | BackwardsSyncAlgorithm | Backward sync reached ancestor header, starting Forward sync
2022-08-11 01:45:25.544+00:00 | nioEventLoopGroup-3-6 | WARN | CliqueExtraData | Expected a CliqueExtraData instance but got null. Reparsing required.
2022-08-11 01:45:25.545+00:00 | nioEventLoopGroup-3-6 | WARN | CliqueExtraData | Expected a CliqueExtraData instance but got null. Reparsing required.
2022-08-11 01:45:25.545+00:00 | nioEventLoopGroup-3-6 | WARN | CliqueExtraData | Expected a CliqueExtraData instance but got null. Reparsing required.
2022-08-11 01:45:25.545+00:00 | nioEventLoopGroup-3-6 | WARN | CliqueExtraData | Expected a CliqueExtraData instance but got null. Reparsing required.
2022-08-11 01:45:25.545+00:00 | nioEventLoopGroup-3-6 | WARN | CliqueExtraData | Expected a CliqueExtraData instance but got null. Reparsing required.
2022-08-11 01:45:25.545+00:00 | nioEventLoopGroup-3-6 | WARN | CliqueExtraData | Expected a CliqueExtraData instance but got null. Reparsing required.
2022-08-11 01:45:25.545+00:00 | nioEventLoopGroup-3-6 | WARN | CliqueExtraData | Expected a CliqueExtraData instance but got null. Reparsing required.
2022-08-11 01:45:25.577+00:00 | nioEventLoopGroup-3-6 | WARN | CliqueExtraData | Expected a CliqueExtraData instance but got null. Reparsing required.
2022-08-11 01:45:25.581+00:00 | nioEventLoopGroup-3-6 | WARN | CliqueExtraData | Expected a CliqueExtraData instance but got null. Reparsing required.
2022-08-11 01:45:25.581+00:00 | nioEventLoopGroup-3-6 | WARN | CliqueExtraData | Expected a CliqueExtraData instance but got null. Reparsing required.
2022-08-11 01:45:25.588+00:00 | nioEventLoopGroup-3-6 | WARN | CliqueExtraData | Expected a CliqueExtraData instance but got null. Reparsing required.
2022-08-11 01:45:25.600+00:00 | nioEventLoopGroup-3-6 | WARN | CliqueExtraData | Expected a CliqueExtraData instance but got null. Reparsing required.
2022-08-11 01:45:25.604+00:00 | nioEventLoopGroup-3-6 | WARN | CliqueExtraData | Expected a CliqueExtraData instance but got null. Reparsing required.
2022-08-11 01:45:25.616+00:00 | nioEventLoopGroup-3-6 | WARN | CliqueExtraData | Expected a CliqueExtraData instance but got null. Reparsing required.
2022-08-11 01:45:25.707+00:00 | nioEventLoopGroup-3-6 | INFO | ForwardSyncStep | Saved blocks 7382818 -> 7382818 (target: 7382819)
```
But then it rejects block 7382819 (0xb43d883d637e8434b3f664e0f8ba0d1914d4d9c01a46be1dfc5f1ecd7bd381fd):
```
2022-08-11 01:45:27.141+00:00 | vert.x-worker-thread-0 | INFO | AbstractBlockProcessor | Block processing error: transaction invalid 'NONCE_TOO_LOW'. Block 0xb43d883d637e8434b3f664e0f8ba0d1914d4d9c01a46be1dfc5f1ecd7bd381fd Transaction 0x1e5786508fc3c00c6254accfd1a82f9fbc1f7bf717c8db7fae96d9c547b46024
2022-08-11 01:45:27.141+00:00 | vert.x-worker-thread-0 | ERROR | MainnetBlockValidator | Error processing block. Block 7382819 (0xb43d883d637e8434b3f664e0f8ba0d1914d4d9c01a46be1dfc5f1ecd7bd381fd)
2022-08-11 01:45:27.143+00:00 | vert.x-worker-thread-0 | WARN | EngineNewPayload | Invalid new payload: number: 7382819, hash: 0xb43d883d637e8434b3f664e0f8ba0d1914d4d9c01a46be1dfc5f1ecd7bd381fd, parentHash: 0xa1be33e115ca56b51b80f2b7070d8062c3959f0b4a4f5fa35e91135894f55d30, latestValidHash: 0x0000000000000000000000000000000000000000000000000000000000000000, status: INVALID, validationError: Error processing b
lock
2022-08-11 01:45:27.333+00:00 | ForkJoinPool.commonPool-worker-7 | INFO | BackwardsSyncAlgorithm | The Backward sync is done...
```
**FIXME:** That rejection is incorrect. `0xb43d883d637e8434b3f664e0f8ba0d1914d4d9c01a46be1dfc5f1ecd7bd381fd` is the execution block hash for the canonical block at height 7382819 (beacon block 3639527 - `0x4f361e3a144edf3bc08a9718f2efc70d6d68d1f07eb271b9bb73a17140473691`). That's the block Teku originally got a SYNCING response for, and when it retried execution then got an INVALID response:
```
{"@timestamp":"2022-08-11T01:45:26,559","level":"INFO","thread":"beaconchain-async-1","class":"FailedExecutionPool","message":"Retrying execution of block 4f361e3a144edf3bc08a9718f2efc70d6d68d1f07eb271b9bb73a17140473691 (3639527)","throwable":""}
{"@timestamp":"2022-08-11T01:45:27,148","level":"WARN","thread":"forkchoice-async-0","class":"ForkChoiceStrategy","message":"Payload for child of block root 0xfcc228a0a5bfb08ca7cb9d4dafe6d05ec19fe7c87c597a38cd8bc382cb1b6789 was invalid","throwable":""}
```
**THEORY**: One thing to note here is that the log messages suggest the backward sync completed successfully, but the retry from Teku triggered the import failure and that retry ran just before the backward sync completed. Could there be a race condition here and the retry newPayload call wind up executing with an incorrect or only partially updated world state?
At this point Teku is unable to follow the canonical chain because it has been told its invalid. It is left with it's chain head at the last pre-merge beacon block.
**FIXME:** Teku should log the actual block slot and root when newPayload returns invalid instead of just "Payload for child of block root".
Eventually, one of the validators is assigned to propose, so builds and imports beacon block 3639557 `9135525cd1ff878a37bd42bf254198a732bb125f09c8fcaaf99ec6804c0bccf5`. Which contained execution payload `0x321035c2dfaf9adc3f9c49cf4908acfe35a984566c35ddaa91b821cd8874f293` (height 7382819) with parent `0x2d0230d6a70a8ad017c0cb846fc931c8c8507a7677e48d5a2a4a208279ab3f4d` (which is our original problematic block that Teku picked as its terminal block). This triggers Teku to print pandas and make it look like Teku caught up on metrics - then it starts falling behind again until next time it produces a block.
Teku then reorgs to a pre-merge block:
```
{"@timestamp":"2022-08-11T01:51:40,033","level":"INFO","thread":"forkchoice-async-0","class":"teku-event-log","message":"Reorg Event *** New Head: 5c8bdf5860d0dd087ae0ab4ee7346fcf30b4ff7f424ce8a0cea36675a469eba2 (3639558), Previous Head: 9135525cd1ff878a37bd42bf254198a732bb125f09c8fcaaf99ec6804c0bccf5 (3639557), Common Ancestor: fcc228a0a5bfb08ca7cb9d4dafe6d05ec19fe7c87c597a38cd8bc382cb1b6789 (3639526)","throwable":""}
```
From that point on besu starts rejecting the problematic block:
```
2022-08-11 01:51:40.036+00:00 | vert.x-worker-thread-0 | INFO | MergeCoordinator | Ignoring update to old head
2022-08-11 01:51:52.026+00:00 | vert.x-worker-thread-0 | INFO | MergeCoordinator | Ignoring update to old head
2022-08-11 01:52:04.026+00:00 | vert.x-worker-thread-0 | INFO | MergeCoordinator | Ignoring update to old head
2022-08-11 01:52:16.027+00:00 | vert.x-worker-thread-0 | INFO | MergeCoordinator | Ignoring update to old head
2022-08-11 01:52:28.028+00:00 | vert.x-worker-thread-0 | INFO | MergeCoordinator | Ignoring update to old head
2022-08-11 01:52:28.028+00:00 | vert.x-worker-thread-0 | INFO | EngineForkchoiceUpdated | INVALID for fork-choice-update: head: 0x2d0230d6a70a8ad017c0cb846fc931c8c8507a7677e48d5a2a4a208279ab3f4d, finalized: 0x0000000000000000000000000000000000000000000000000000000000000000, safeBlockHash: 0x0000000000000000000000000000000000000000000000000000000000000000
```
Since Teku is back to a pre-merge head, it will be sending fCU with the terminal block it has selected as the head block, but now besu is saying it's invalid.
**FIXME:** I believe besu is incorrect to reject this here. It is valid for the beacon chain to re-org back to a pre-merge chain and then try to build on a different transition block.