Try   HackMD

Witholding attacks on Goerli:

Here's a short summary of the main points

  • All the attacks I've analyzed are vanilla withholding attacks.
  • At the start of epoch N, epoch N-1 has not been justified. A late block from epoch N-1 triggers an epoch long reorg during epoch N. The chain advances on the attacker's branch until the next epoch transition into N+1, and during on_tick the chain reorgs back to the canonical chain due to pulled tips.
  • The attacks are occurring more often than our predictions for the size of the stake of the faulty LightHouse node.

Some possible causes that help in the last point are:

  • During epoch N honest validators on the attackers branch and on the canonical branch are voting in a different way towards justification of N. Those in the attackers' branch vote source: N-1 --> target: N. Those in the canonical branch vote source: N-2 --> target N thus it is difficult to justify N and therefore it's more likely to set the attack again during the next epoch.

  • Participation is very low during N+1, and this is unnexpected: the chain should be canonical since slot 0. We find many blocks at the beginning of this epoch based on the attacker's branch in N. This can only happen if the proposer does not have pulled tips enabled or if the implementation has a faulty on_tick.

  • We see many proposal misses even after a canonical block has been proposed in N+1 (thus realizing the justification of the canonical branch in N). Particularly, we see consistent misses by Nimbus and Prysm (not counting the many proposers that are not known). This leads me to believe that there are bugs in both Nimbus and Prysm's implementations. We are aware of some bugs in Prysm in these situations.

  • Given the above I would recommend we ship the fix for witholding as soon as possible.

Goerli reorg 890 > 862:

This reorg is a vanilla witholding attack. The forkchoice tree looks like

Epoch 117402  |             117403

3756861 <------- 864 <- 865 <-- ... <- 890

Block 861 has not justified 401 yet, not even unrealized it:

{
      "slot": "3756861",
      "root": "0xa9cd348f30953dc7f83db46713e20fb32d94c6f957eafd6233bcf38f3353b146",
      "parent_root": "0x421d23cadacf49771c2ef18f0566fa15d160aff4fe5cc6172fa72c3bd5a78e26",
      "justified_epoch": "117400",
      "finalized_epoch": "117397",
      "unrealized_justified_epoch": "117400",
      "unrealized_finalized_epoch": "117397",
      "balance": "351000000000",
      "weight": "9789365875000000",
      "execution_optimistic": false,
      "execution_payload": "0x46bcd21147a0379421f4dbfd1434cc3951b54096283597abbd68b5911296ac29"
},

By the time 864 is supposed to come we already have enough votes to justify 401, that is block 864 has this epoch unrealized justified:

{
      "slot": "3756864",
      "root": "0xf6639992f9ec7815515145731971e89fe77366777d09e4ef549078763ac90af1",
      "parent_root": "0xa9cd348f30953dc7f83db46713e20fb32d94c6f957eafd6233bcf38f3353b146",
      "justified_epoch": "117401",
      "finalized_epoch": "117400",
      "unrealized_justified_epoch": "117401",
      "unrealized_finalized_epoch": "117400",
      "balance": "0",
      "weight": "9784950875000000",
      "execution_optimistic": false,
      "execution_payload": "0x68747a9fa9c09adec787fef0377acfc1bab767fb9e571c6abb62f0727afea9a1"
}

You can disregard the "justified_epoch" values since they were realized later when the chain advanced. At the time the block arrives, this epoch could not have been justified. Since the votes were not on-chain

The chain advances on this epoch until we have our head block in 890 that hasn't changed FFG since 864 at the beginning of the epoch. This block is received on time (1.2 seconds on Prysm's cluster)

{
      "slot": "3756890",
      "root": "0xcab5402d27b4b9abb098ecaa0c2ed1d6712165b8c2057f3c38fc48555bfb3b34",
      "parent_root": "0x3e7384d96f6c595c02e7130a233f16c09d67b7c813f46314a4a505bcea3d3e2d",
      "justified_epoch": "117401",
      "finalized_epoch": "117400",
      "unrealized_justified_epoch": "117401",
      "unrealized_finalized_epoch": "117400",
      "balance": "0",
      "weight": "9784726875000000",
      "execution_optimistic": false,
      "execution_payload": "0xb3b90c21085ec5c03ae19ba11c7ab449cc6d20cc97df87d0bf54841f89931c6e"
},

Notice the balance: 0 value. This means no one attested on this block.

In this moment we receive block 862 that has the same FFG information:

{
      "slot": "3756862",
      "root": "0xfa8b9322d78567457d85b9606c4f2268c297389793f0a70da7000aeaa64870ef",
      "parent_root": "0xa9cd348f30953dc7f83db46713e20fb32d94c6f957eafd6233bcf38f3353b146",
      "justified_epoch": "117401",
      "finalized_epoch": "117400",
      "unrealized_justified_epoch": "117401",
      "unrealized_finalized_epoch": "117400",
      "balance": "1760000000000",
      "weight": "4064000000000",
      "execution_optimistic": false,
      "execution_payload": "0xa3bcdfb70138b599ed08091cdacb0eb1c901e50618044c4c1ede96dd3e4067cf"
},

This block arrives 8 seconds into 890's slot in Prysm's cluster. Being a block from a previous epoch, this block's unrealized justification is realized immediately and beats 890's FFG info.

Reorg 637->638

Forkchoice tree is

Epoch 117518 |   519 ....
3760605 <------- 608 <-- ... <- 635
   \--606 (late 6'9", during 636) 

The common ancestor does not have epoch 518 unrealized justified yet:

    {
      "slot": "3760605",
      "root": "0x65393d57ff79d796993acebe886a47cd75c572059ab4a7c8c9d7688b82bb87fd",
      "parent_root": "0x5f50c5ab2854812003eff9d58a6cfa8f437a76652cf0a14c641ba5cbad741787",
      "justified_epoch": "117517",
      "finalized_epoch": "117516",
      "unrealized_justified_epoch": "117517",
      "unrealized_finalized_epoch": "117516",
      "balance": "11040000000000",
      "weight": "9165071000000000",
      "execution_optimistic": false,
      "execution_payload": "0xd7ca9b3c24d5cff8d1c38b61d768fcc2cca78f6729265ea717539ac5a4aba800"
    },

The block 606 that comes late during slot 636 does. This is slot 30 in the Epoch

    {
      "slot": "3760606",
      "root": "0xdb7558502743338474867426d9ad99febc2ac907953ceedbb0fd96d261fc4617",
      "parent_root": "0x65393d57ff79d796993acebe886a47cd75c572059ab4a7c8c9d7688b82bb87fd",
      "justified_epoch": "117518",
      "finalized_epoch": "117517",
      "unrealized_justified_epoch": "117518",
      "unrealized_finalized_epoch": "117517",
      "balance": "53408000000000",
      "weight": "70112000000000",
      "execution_optimistic": false,
      "execution_payload": "0x66fa817498d2e664e6c074939c6977a88fd2c006152d2fac188e739968a0dd6b"
    },

Slot 635 (the one that was head at the time) could not have had this justified information being slot 27 in the Epoch.

{
      "slot": "3760608",
      "root": "0x03a03f4bcc8d99c3cc793a571d9c0160ff37bdc108d6bc0acf89b46f6780ee28",
      "parent_root": "0x65393d57ff79d796993acebe886a47cd75c572059ab4a7c8c9d7688b82bb87fd",
      "justified_epoch": "117518",
      "finalized_epoch": "117517",
      "unrealized_justified_epoch": "117518",
      "unrealized_finalized_epoch": "117517",
      "balance": "11584000000000",
      "weight": "9083919000000000",
      "execution_optimistic": false,
      "execution_payload": "0xe7659b1956cdf66dbc9fd5bed257736296aaf9f9d7e724fcde91b1daf4bed932"
    },

Reorgs starting at 3760606

During Epoch 117518. the canonical chain has the last block of that epoch the following in slot 3760605 (slot 29)

    {
      "balance": "179061000000000",
      "execution_optimistic": false,
      "execution_payload": "0xd7ca9b3c24d5cff8d1c38b61d768fcc2cca78f6729265ea717539ac5a4aba800",
      "finalized_epoch": "117516",
      "justified_epoch": "117517",
      "parent_root": "0x5f50c5ab2854812003eff9d58a6cfa8f437a76652cf0a14c641ba5cbad741787",
      "root": "0x65393d57ff79d796993acebe886a47cd75c572059ab4a7c8c9d7688b82bb87fd",
      "slot": "3760605",
      "unrealized_finalized_epoch": "117516",
      "unrealized_justified_epoch": "117517",
      "weight": "7710943000000000"
    },

We have been finalizing well before this point, but this Epoch did not have enough information on-chain to justify it as we can see fro the above unrealized justified checkpoints.

The chain continues during the next epoch 117519 until the head 3760635, with this forkchoice information

{
      "balance": "272442000000000",
      "execution_optimistic": false,
      "execution_payload": "0xd66451720491fa9be90e937801502a6d7ecdb6111cbec2cdf04b821d4780adf2",
      "finalized_epoch": "117516",
      "justified_epoch": "117517",
      "parent_root": "0x99c409327586be23118ae1137d4271ace1b077f2e72f423958c8e8a8f25cbde1",
      "root": "0x44016b416cb6d1f637b44d7e6c554e1e3147c27f2e0b76ecda723a5438b39839",
      "slot": "3760635",
      "unrealized_finalized_epoch": "117517",
      "unrealized_justified_epoch": "117518",
      "weight": "272442000000000"
    },

So this block had enough FFG information to justify the previous epoch but it was only unrealized.

When block 3760606 arrives, it becomes head immediately, at that time the it's forkchoice information is

    {
      "balance": "0",
      "execution_optimistic": false,
      "execution_payload": "0x66fa817498d2e664e6c074939c6977a88fd2c006152d2fac188e739968a0dd6b",
      "finalized_epoch": "117517",
      "justified_epoch": "117518",
      "parent_root": "0x65393d57ff79d796993acebe886a47cd75c572059ab4a7c8c9d7688b82bb87fd",
      "root": "0xdb7558502743338474867426d9ad99febc2ac907953ceedbb0fd96d261fc4617",
      "slot": "3760606",
      "unrealized_finalized_epoch": "117517",
      "unrealized_justified_epoch": "117518",
      "weight": "0"
    },

We see that the weight and balance are still zero, but this node has justified 117518. It arrives during slot 3760636 in the next epoch, 30 slots later. It is a typical witholding pattern. As all the cases above, the forkchoice tree looks

Epoch 117518   |  117519

3760605 <--------- 608 <-- .... <-- 635
   \------606 (late)

This time (again) the attack happened on slot 30, LightHouse's node is benefitted by Nimbus missing slot 31. The epoch in question had 6 missed blocks before the attack. Beaconcha.in shows a participation of around 74% on those epochs which is consistent with this number.

The chain advances, in next head event we see the attacker's block got votes:

    {
      "balance": "220356000000000",
      "execution_optimistic": false,
      "execution_payload": "0x66fa817498d2e664e6c074939c6977a88fd2c006152d2fac188e739968a0dd6b",
      "finalized_epoch": "117517",
      "justified_epoch": "117518",
      "parent_root": "0x65393d57ff79d796993acebe886a47cd75c572059ab4a7c8c9d7688b82bb87fd",
      "root": "0xdb7558502743338474867426d9ad99febc2ac907953ceedbb0fd96d261fc4617",
      "slot": "3760606",
      "unrealized_finalized_epoch": "117517",
      "unrealized_justified_epoch": "117518",
      "weight": "220356000000000"
    },

Balance equals weight cause there's no child with votes on this block. During this slot block 637 arrives and is based on the attacker's block, it hasn't got any votes yet:

    {
      "balance": "0",
      "execution_optimistic": false,
      "execution_payload": "0xd47983764ebfa81aabdd8fa70c1efca858f3fb613dfde6f2507d24acb9d30480",
      "finalized_epoch": "117517",
      "justified_epoch": "117518",
      "parent_root": "0xdb7558502743338474867426d9ad99febc2ac907953ceedbb0fd96d261fc4617",
      "root": "0x916f34d02503dc8c39cfc5485bef6c1cbd1bc8949e7b8c1df511ca7d4d3c8cd4",
      "slot": "3760637",
      "unrealized_finalized_epoch": "117517",
      "unrealized_justified_epoch": "117518",
      "weight": "0"
    },

Two seconds later there is a new reorg, it is not because of a block, but rather on_tick at the epoch transition. This happens in slot 3760640. There are a few points to notice here:

  • Right after the attack a block 637 is proposed based on the attacker's block (this is proposed by LightHouse). 638 does not buid on it, it builds on top of the attacker's block 636. Prysm does not take this block as head. It does not see any block in 639.
  • Epoch transition happens at Unix Epoch 1661635680. This is the only head event that prysm receives after syncing 637. During on_tick the canonical chain at 635 justifies Epoch 117518 since it had it before unrealized. We can see from the snapshot of forkchoice that this node has now competing FFG information with the attacker's block, and the chain reorgs back to 635 because of pulled tips:
    {
      "balance": "578720000000000",
      "execution_optimistic": false,
      "execution_payload": "0xd66451720491fa9be90e937801502a6d7ecdb6111cbec2cdf04b821d4780adf2",
      "finalized_epoch": "117517",
      "justified_epoch": "117518",
      "parent_root": "0x99c409327586be23118ae1137d4271ace1b077f2e72f423958c8e8a8f25cbde1",
      "root": "0x44016b416cb6d1f637b44d7e6c554e1e3147c27f2e0b76ecda723a5438b39839",
      "slot": "3760635",
      "unrealized_finalized_epoch": "117517",
      "unrealized_justified_epoch": "117518",
      "weight": "578720000000000"
    },

From the point of view of the canonical chain, epoch 117519 has many missed slots, since everything that built on top of the attacker's block is orphaned at the Epoch transition. In this particular case there were also lots of missed blocks in that epoch. But even if they weren't there. Any honest validators votes in the attacker's branch will not count towards justification of 117519 since in this branch honest validators are voting

source: 117518  ---->  target: 117519

While the canonical chain (that eventually wins) voted during this Epoch as

source: 117517 ----> target: 117519

This means that justifying the current epoch, during an attack, it's much harder than with full participation. Hence if the attacker has a late block proposal during this epoch, he can pull another attack in consecutive epochs.

The next epoch:

The situation painted above is the typical situation for a very long witholding attack: it survives until the beginning of the next epoch and pulled tips takes care of reorging back to the canonical chain. There is no reason for the chain not to advance correctly during this epoch. However, we see many missed slots in these epochs where the chain has reorged. There is no good explanation for this. In the example we analyzed above, the next epoch is 117520, it has 22 missed/orphaned slots.

One way of explaining this could be clients that are not running pulled tips: These clients would not have reorged during on_tick and thus would continue the attacker's branch until a block realizes the same justification.

From the perspective of the Prysm node giving the forkchoice dumps, the first slot 640 is missed. 641 is based on 638 on the attacker's branch. 642 and 643 are missing, but 644 is based still on the attacker's branch (it's based on 641). 645 is the first canonical block (and the first proposed by a known entity, in this case the EF) and is built on top of 635. This block realizes the justification and thus after this block even nodes without pulled tips enabled should be able to follow the canonical chain. After 645 is inserted, The canonical branch (that forks off at 608) vs the attacker's branch (started at 606) have the following LMD weights. the attacker has 38 464 ETH:

    {
      "balance": "29440000000000",
      "execution_optimistic": false,
      "execution_payload": "0x66fa817498d2e664e6c074939c6977a88fd2c006152d2fac188e739968a0dd6b",
      "finalized_epoch": "117517",
      "justified_epoch": "117518",
      "parent_root": "0x65393d57ff79d796993acebe886a47cd75c572059ab4a7c8c9d7688b82bb87fd",
      "root": "0xdb7558502743338474867426d9ad99febc2ac907953ceedbb0fd96d261fc4617",
      "slot": "3760606",
      "unrealized_finalized_epoch": "117517",
      "unrealized_justified_epoch": "117518",
      "weight": "38464000000000"
    },

While the canonical block has 9 270 190 ETH

    {
      "balance": "6112000000000",
      "execution_optimistic": false,
      "execution_payload": "0xe7659b1956cdf66dbc9fd5bed257736296aaf9f9d7e724fcde91b1daf4bed932",
      "finalized_epoch": "117517",
      "justified_epoch": "117518",
      "parent_root": "0x65393d57ff79d796993acebe886a47cd75c572059ab4a7c8c9d7688b82bb87fd",
      "root": "0x03a03f4bcc8d99c3cc793a571d9c0160ff37bdc108d6bc0acf89b46f6780ee28",
      "slot": "3760608",
      "unrealized_finalized_epoch": "117517",
      "unrealized_justified_epoch": "117518",
      "weight": "9270190000000000"
    },

So there can't be any contention of weights. Indeed the next 2 slots are also proposed and canonical. Nimbus and Prysm miss the next two blocks. In fact in all the cases that I have analized, the low participation in the next epoch is due to Nimbus, Prysm and unknown proposers. This leads me to believe that there are bugs in these implementations under long reorgs.