Debugging

Logs: https://drive.google.com/file/d/1M5rcOt_WvilZIUgxabjMrSg7kMtPRBTQ/view

What happened to 0x5e060d315a364b3cc84cd3984ddca9792057f853218e9197c3f49104361fb189?

Note:

This document refers to two blocks:

Significant Log Events

We try to look for the parent (1b28) (I didn't bother to figure out the cause of this lookup):

Sep 26 07:23:55.224 DEBG Searching for block, block: 0x1b28…49c4, peer_id: 16Uiu2HAmT4bUv8e8XWUH4XaB4J3ipAgHrEGcYFLbXjdYYuuoyugP, service: sync, module: network::sync::block_lookups:81

We still haven't managed to import 1b28, although we've requested/searched for the it a lot. Now, we receive the grandchild and realise we don't know the child:

Sep 26 07:23:59.750 DEBG Rejected gossip block, slot: 4779418, graffiti: , error: ParentUnknown(parent_root:0x1b28…49c4), service: beacon, module: beacon_chain::beacon_chain:2403
Sep 26 07:23:59.754 DEBG Unknown parent for gossip block, root: 0x5e060d315a364b3cc84cd3984ddca9792057f853218e9197c3f49104361fb189, module: network::beacon_processor::worker::gossip_methods:774

We process a parent chain referencing the child, but don't import any blocks (was_non_empty: false):

Sep 26 07:24:09.359 DEBG Parent chain processed, result: Success { was_non_empty: false }, chain_hash: 0x5e06…b189, service: sync, module: network::sync::block_lookups:603

We reject a descendant of the child:

Sep 26 07:24:14.339 DEBG Rejected gossip block, slot: 4779419, graffiti: Lido ⚡️Allnodes, error: ParentUnknown(parent_root:0x5e06…b189), service: beacon, module: beacon_chain::beacon_chain:2403
Sep 26 07:24:14.354 DEBG Unknown parent for gossip block, root: 0x45fd94dd71e8d21f59782110e91b335ba64c296b8a6dcc488871f55cc511d097, module: network::beacon_processor::worker::gossip_methods:774

We fail to process the parent due to a bad chain of block roots:

Sep 26 07:24:17.022 DEBG Processing parent lookup, blocks: 2, chain_hash: > 0x1b28…49c4, module: network::beacon_processor::worker::sync_methods:210
Sep 26 07:24:17.055 DEBG Invalid block received, outcome: NonLinearParentRoots, msg: peer sent invalid block, module: network::beacon_processor::worker::sync_methods:497
Sep 26 07:24:17.055 DEBG Parent lookup failed, error: Peer sent invalid block. Reason: NonLinearParentRoots, module: network::beacon_processor::worker::sync_methods:222
Sep 26 07:24:17.055 DEBG Parent chain processed, result: NonFaultyFailure, chain_hash: 0x1b28…49c4, service: sync, module: network::sync::block_lookups:603

Now, we reject another parent lookup which has a chain hash of the grandchild. Notably, this lookup contains 1 block (the child, I assume) and we downscore the peer because we didn't know the parent (but recall that we rejected the parent chain in the previous logs):

Sep 26 07:24:22.217 DEBG Parent lookup failed, error: Block has an unknown parent: 0x1b28…49c4, module: network::beacon_processor::worker::sync_methods:222
Sep 26 07:24:22.217 DEBG Parent chain processed, result: FaultyFailure { imported_blocks: false, penalty: LowToleranceError }, chain_hash: 0x45fd…d097, service: sync, module: network::sync::block_lookups:603

We import a parent chain with a chain hash of the child (5e06), however we set its parent (1b28) as the head. Fork choice always chooses the tip of the chain, so this means we didn't import 5e06 (only its parent):

Sep 26 07:24:23.350 DEBG Fork choice updated head, old_head: 0xebf4e958d6079a4b70f4ca3e0fcb806d8c3d86a0b3e032ce198649b2e4420d3d, old_head_weight: Some(431872000000000), new_head: 0x1b285b21f3d94a6d063791557b87ffe9c2f5f512553a306f72a413a3656049c4, new_head_weight: Some(0), service: beacon, module: beacon_chain::canonical_head:1072
Sep 26 07:24:23.350 DEBG Parent chain processed, result: Success { was_non_empty: true }, chain_hash: 0x5e06…b189, service: sync, module: network::sync::block_lookups:603

5e06 is imported via RPC:

Sep 26 07:24:33.232 INFO New RPC block received, hash: 0x5e06…b189, slot: 4779418, module: network::beacon_processor::worker::sync_methods:95

5e06 is set as head only a tiny bit late:

Sep 26 07:24:33.242 DEBG Fork choice updated head, old_head: 0x1b285b21f3d94a6d063791557b87ffe9c2f5f512553a306f72a413a3656049c4, old_head_weight: Some(430784000000000), new_head: 0x5e060d315a364b3cc84cd3984ddca9792057f853218e9197c3f49104361fb189, new_head_weight: Some(0), service: beacon, module: beacon_chain::canonical_head:1072
Sep 26 07:24:33.280 DEBG Delayed head block, set_as_head_delay: Some(48.012041ms), imported_delay: Some(711.793779ms), observed_delay: Some(33.520171113s), block_delay: 34.279976933s, slot: 4779418, proposer_index: 431327, block_root: 0x5e060d315a364b3cc84cd3984ddca9792057f853218e9197c3f49104361fb189, service: beacon, module: beacon_chain::canonical_head:1351

5e06 is no longer head:

Sep 26 07:25:28.930 DEBG Fork choice updated head, old_head: 0x5e060d315a364b3cc84cd3984ddca9792057f853218e9197c3f49104361fb189, old_head_weight: Some(431648000000000), new_head: 0x45fd94dd71e8d21f59782110e91b335ba64c296b8a6dcc488871f55cc511d097, new_head_weight: Some(0), service: beacon, module: beacon_chain::canonical_head:1072

End

Analysis

It seems that things are generally working as expected, considering the user has low-spec hardware. It seems that a peer sent us invalid blocks (non-linear roots), however that's not interesting on its own.

What is interesting is why we have a Parent Chain Processed with chain_hash: 0x5e06…b189, but then later declare that 0x5e06…b189 is unknown.

I would think that's impossible since when a ParentLookup is created the child block (the one with the unknown parent) is automatically added to downloaded_blocks: https://github.com/sigp/lighthouse/blob/01e84b71f524968f5b940fbd2fa31d29408b6581/beacon_node/network/src/sync/block_lookups/parent_lookup.rs#L71

However, I did find that ParentLookup::chain_blocks will mem::take those downloaded_blocks and leave an empty vec in its place: https://github.com/sigp/lighthouse/blob/01e84b71f524968f5b940fbd2fa31d29408b6581/beacon_node/network/src/sync/block_lookups/parent_lookup.rs#L129

We can see in this code that we pull the parent lookup from parent_queue, run ParentLookup::chain_blocks, and then put it back in the parent_queue. This means that it's somewhat possible for a parent lookup to exist in the parent queue which does not contain the original child block (the one with the missing parent).

Looking at some more logs, we can see this sequence:

Sep 26 07:24:07.972 DEBG Processing parent lookup, blocks: 13, chain_hash: 0x5e06…b189, module: network::beacon_processor::worker::sync_methods:210

...redacted for brevity...

Sep 26 07:24:09.013 DEBG Requesting parent, chain_hash: 0x5e06…b189, request: parent, hash: 0x1b28…49c4, downloading_peer: 16Uiu2HAmQXbkSnwNXPgZosAe2kdNozHWSNcKVoEetDF1vQx2sPd1, failed_downloads: 0, failed_processing: 0, downloaded_blocks: 1, service: sync, module: network::sync::block_lookups:699

...redacted for brevity...

Sep 26 07:24:09.359 DEBG Parent lookup processed successfully, module: network::beacon_processor::worker::sync_methods:232
Sep 26 07:24:09.359 DEBG Parent chain processed, result: Success { was_non_empty: false }, chain_hash: 0x5e06…b189, service: sync, module: network::sync::block_lookups:603

Here we can see that whilst we are processing a chain segment we create another ParentLookup for the same chain hash.

When the chain segment was imported, it would have called BlockLookups::parent_chain_processed which would then try to remove the ParentLookup via the chain_hash. If it happened to remove the parent lookup that was created whilst it was running (rather than the one that triggered it), then I believe it's possible that the ParentLookup that exists in parent_queue does not have any downloaded blocks, therefore making it possible for a Parent Chain Processed log to be emitted whilst the chain_hash it references was not applied to the BeaconChain.

I haven't managed to link the logs all the way through to showing the end result of the parent being imported without the child, however this might be a good start towards that.