Try   HackMD

A quick intro to reorgs

Theory

The CAP theorem says that no blockchain can be live under dynamic participation and safe under temporary network partitions. In order to design a reliable distributed network that reaches consensus reliably some trade offs need to be made between liveness, safety and asynchrony as it is not possible instill all three properties all at the same time.

  • Safety refers to the notion that decisions cannot be reverted once written to the network.
  • Liveness is the ability for the network to still be available and provide a service even if a large portion of the nodes are offline.
  • Asynchrony: an asynchronous protocol has no bound on how long a message may take to get delivered

In practice, the real world introduces challenges such as unreliable communication, non-compliant protocol behaviour, and system crashes, which can compromise liveness, asynchrony, and safety.
In the context of reorgs it is important to highlight that live and safe blockchain protocol will still exhibit reorgs from time to time by design under network partitions. Hence, implementing a method to resolve these issues is pertinent to ensure consensus is met between participants.

Reorgs attack the property of liveness as they remove honest blocks from the chain which results in a reduction in the quality of service of the network and it affects the latency and capacity of the chain. An increase in latency can lead to uncertainty around confirmation in transactions, whilst the decrease in capacity leads to a decrease in transaction throughput effectively raising transaction fees.

Motivations behind reorging blocks

The reorgability of a chain in conjunction with the MEV that can be extracted makes participants deviate from honest strategies which ultimately undermines the security of the network.

Some of the key impacts and motivations for reorging a block include:

  • Re-capturing highly valuable MEV in individual or consecutive blocks
  • Front-running and back-running transactions
  • Ability to double spend across multiple chains (by simply making a very large tx on eth chain to receive some goods on another chain to only reorg the tx on eth later on)
  • Causing finality delays to degrade overall trust in end users utilising financial applications built on top of Ethereum
  • Partitioning the network to disrupt Ethereum and profit from strategic short position(s).
  • and many more long-tail reasons…

Fork choice rules

As of now, there are three main types of fork choice rules that are used to determine the canonical chain when there happens to be a reorg. These include:

  1. Nakamoto algorythm (Longest chain rule): In PoW protocols, the longest chain rule is applied to determine the head of the chain. It simply determines the chain with the most amount of work to be the canonical chain if there happens to be a partition in the network. PoW does not have any finality mechanism which means blocks can get reverted indefinitely as long as the competing chain has more work done on it.
  2. GASPER is a combination of LMD GHOST and Casper FFG. LMD GHOST is used as a fork choice rule to determine the head of the canonical chain. Given a fork in the network, GHOST selects the fork that has the most votes (it does this by considering all of the votes for each fork block and their respective child blocks). Casper FFG is a mechanism which favours safety over liveness when making decisions. Casper FFG is used to finalize blocks to the permanent chain and ensure blocks cannot be reverted past 2 epochs (12 mins).
  3. Tendermint has single block (1-10s)finality and reorgs can never occur as it prioritises safety.

The main focus will be looking over the Longest chain and LMD GHOST fork choice rules to see how reorgs may come about to gain a deeper understanding. In PoW reorgs are more common as a single block and can be simply reorged by a miner that mines the next 2 consecutive blocks. However, in Eth post-merge reorg attacks are harder to achieve since there are over 10,000 randomly assigned individual attestors per slot that you need to overcome in addition to the block proposer.

How Reorgs occur

Traditionally speaking on Pre-merge Eth, reorgs take place due to multiple miners racing each other to mine and propogate valid blocks. Since finding a valid block is probabilistic in Eth pre-merge, it meant that there were be points in time where two blocks could be at the same time. This ultimately resulted in chain reorganization (reorgs) as both blocks competed to be the head of the chain splitting the rest of the honest miners. In these instances a fork choice rule is implemented to figure which block is head of the caninical chain.

A simple example of pre-Merge reorgs can be viewed below that may arrise due to network propagation issues:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Example walkthrough

  • Two miners find a block at the same time and publish the blocks to the network.
  • This leads to a partition in the network as miners have to choose arbitrarily which block to mine on top at first.
  • The following miner mines the next block Block 3(a) referencing Block 2(a) and similarly the following miner mines the next block 4(a) referencing Block 3(a) as the parent block.
  • At this point, there is considerably more work done on chain A, hence even if Block 4(b) builds on top of block 2(b). Honest miners using the Longest chain fork choice rule will drop chain B and will continue to mine on chain A.

Most often, reorgs occur due to data propagation issues, for example if a miner happens to mine a block but cannot propagate it through the network fast enough, their block will eventually not be seen by the network and the preceding miners that happen to find a block will not include it as head of the chain and will mine on top of a previously existing block.

It is important to mention that reorgs can also occur by malicious actors. Miners can gain more of the protocol-prescribed rewards by following a dishonest policy. An example of this is selfish mining, where miners privately mine blocks without publishing the blocks to the network until they become the head of the chain. Once they are at the tip of the chain, they release the multiple consecutive N blocks mined in private, reverting the reward given to honest miners from previous blocks and at the same time re-extracting any previous MEV that was captured. Under the longest chain fork choice rule, honest miners will mine on top of the reorged chain as it is provably longer and more work has been conducted.

Although it is not easy to mine consecutive blocks, it may well be in the interest of large mining pools to maximise their profits and selfish mine cutting the profits of honest miners in the long run.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →


Brief overview of PoS and the block finalization process

In Eth Post-Merge, time is divided into epochs, which consist of 32 slots, each slot being 12 seconds long. The security of the PoS network solely derives from the staked ether validators have locked up and not the mining hash rate as it once did.

Unlike miners in PoW, validators get randomly selected to propose a block every slot. At the start of every epoch (32 slots), all validators get randomly assigned to a committee index and are incentivised to make attestations to the head of the chain at every slot. A block is said to be 'safe', if it receives more than 2/3 of the attestations from validators which puts it as the head of the chain.

Given a fork in the network, the fork choice rule used by Eth PoS is called Latest Message Driven Greedy Heaviest Observed Subtree (LMD GHOST). This rule determines the head of the chain by evaluating all of the votes for each fork block and their respective child blocks and selects the ones with the heaviest weight. A visual example of the decision-making can be seen below where there are multiple forks at a given slot:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Validators also publish two additional votes concerning Casper FFG. Casper Friendly Finality Gadget (FFG) operates on top of the chain and is used to determine finalized blocks at marked checkpoints. The following checkpoints are found at each epoch boundary block and all validators collectively vote to move the finality gadget from the start of one epoch to the next epoch. Casper FFG is used as a safety mechanism to counter large validators wanting to revert the chain past a whole epoch or more. A block that has been finalized is considered to be extremely unlikely to be re-organized, unless a two-thirds majority of validators agree to finalize a competing chain of blocks. Finalized blocks are blocks that are one epoch behind the most recently justified block. The path in which the Friendly Finality Gadget takes can be seen in the diagram below:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Reorgs in Eth-post merge

In Eth PoS even if the probabilistic aspect of when blocks are found is removed (due to having a designated block proposer every 12 secs), the network may still suffer from data propagation issues which result in reorgs and forks. A non-malicious block reorg can be seen below:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Example walkthrough

  • The example above has proposer B at slot N+1 missing to propose a block and the following proposer C building on top of Block A.
  • Due to network latency issues Block C cannot be seen by the rest of the network. Consequently, the majority of the attestations at slot N+2 and N+1 refer to Block A to being head of the chain. At this point Block C, does not have any attestations and is sitting at an equivalent weight of zero.
  • At Slot N+3, proposer D's view of the chain can be seen below the red line in the diagram as two missing blocks are observed. Proposer D releases Block D on time, and honest validators will favour Block D if following the Proposer LMD Score Boosting .
  • Eventually, this renders Block C to be an uncle block, even after being visible to the rest of the network. This is mainly due to Block D competing against Block C to be head of the chain as they both refer to the same parent block, however, since Block D has more attestations and more weight honest validators using LMD GHOST will see Block D as the head of the chain.

Malicious Reorg attacks in PoS

Although malicious reorgs are harder to execute on Eth PoS, they are still possible if an attacker controls around 1/3 of the staked eth for a brief moment. As of today it is difficult to say whether one entity can maliciously coordinate 1/3 of the attestations to attack and undermine the underlying network however, it is important to note that around 45% of Eth staked resides with only three entities (Lido 26%, Coinbase 12% and Kraken 7%). It is also important to highlight that there is a large chunk of staked ether that amounts to 28% of total staked that is undisclosed, and a large portion may still belong to medium-large node operators. Hence, small malicious reorg attacks to extract more MEV and rewards can be feasible.

An example of a malicious single block reorg can be viewed below:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Example walkthrough

  • At slot N, block N is released on time and it receives a total of three attestations.
  • During slot N+1, the attacker privately proposes Block N+1 but does not publish it to the rest of the network. In addition, the attacker privately attests to the private block.
  • At slot N+1 the rest of the validators do not see the private block hence, attest to Block N.
  • The following proposer at slot N+2 cannot view the private block, hence build their block referencing Block N. The attacker then releases the private block slightly earlier than Block N+2 revealing the private block and the private attestations at both N+1 and N+2.
  • As expected this creates a fork in the network as Block N+2 is referring to Block N as its parent block, and so does the attacker.
  • At this point, the validators in slot N+2 view the private block before Block N+2 and attest to it due to having more weight as it caries the attacker's attestation from slot N+1 and slot N+2.
  • Even if half of the validators attest to block N+2 the attacker still has the advantage as they have used their attestations in both slot N+1 and slot N+2 to vote for Block N+1 giving it more weight than Block N+2. Under the GHOST fork choice rule, Block N+1 is seen to be the head of the chain and is heavier effectively completing the reorg. Block N+2 will be dropped from the canonical chain and get deleted from the view of other validators view.

The main reason the attacker is successful is predominantly due to both Block N+1 and Block N+2 inheriting the weight from Block N, however, Block N+1 has the private attestation pointing at slot N+1 which gives it more weight under the fork choice rule. As slot N+2 commences, Block N+1 is deemed to be heavier and the majority of validators will vote for Block N+2. The attacker's attestation at N+2 also gives Block N+1 the advantage to be deemed to be the head of the chain.

This attack can be largely mitigated if the majority of validators are using LMD Proposer boosting. This would mean that validators would prioritise block N+2 since it is released on time, rather than an earlier late block.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Proposer Boost does not entirely protect the network completely from all reorgs since if the next proposer Block N+3 happens to be adversarial (or the attacker gets to propose 2 blocks in a row), they can successfully reorg Block N+2 even if it gets the majority of attestations in slot 2 as we see above. The reason is that attacker can release Block N+3 on time, getting the equivalent boost Block N+2 received. As a result, Block N+1 will be heavier than Block N+2 as it inherits the weight from Block N+3 with the additional (private) attestation received at slot N+1 and slot N+2. This attack can be executed successfully if the attacker controls 7% or more of the attestations in every slot given that the proposer boost weight is 80%. For more info click here.

Another way to reorg blocks would be through collusion. Since Block N+1's main priority isn't to release the block on time they can listen to the mempool for longer allowing them to extract more MEV than they would normally. As a result, the attacker can leave behind a chunk of the MEV captured in Block N+1 to incentivise or bribe the next proposer at Block N+3 to build on the malicious chain. Given that the next block proposer realises there is more eth to be made on the malicious fork they will build the next block on time on top of Block N+1 effectively reorging Block N+2.

The next steps for next week

  • Conduct data analysis on reorgs and block & attestation latency using the live data captured from the lighthouse client
  • Look for cheaper alternatives than AWS to run beacon node