Erigon "The Merge" Archittecture

# Erigon "The Merge" Archittecture The core protocol itself will undergo a massive transformation. Erigon itself had to restructure itself (also for other reasons), and current Stable branch is going to be deprecated post-merge because of it. Let's go over what changed specifically. some topics will be rendundant such as different Gossip, etc... ## El<->Cl Interaction -- External Archittecture After "The Merge", Gossip on the EL will be disabled and instead we are going to use the P2P network only for Syncing up to the chain tip. Once on chain tip, the Consensus Layer will be the one telling us what new blocks to validate and what fork to follow so no gossip is required on Erigon. There are 3 possible EL<->CL interaction: * NewPayload * ForkchoiceUpdated * GetPayload ### NewPayload Request the EL to validate a block for the CL so, Verify the header, Execute it and Check correctness of the state root. ### ForkchoiceUpdated Tell the EL which block(Previously received with NewPayload) is now considered the canonical head and update the main state to point to said block. ForkchoiceUpdated can also request to build a block on the new canonical head, and propose it later on. ### GetPayload Request previously built payload with ForkchoiceUpdated to submit a proposal. ## Chain tip interaction (at a high level generally) Once on chain tip the interactions should look like this![](https://i.imgur.com/pKLkKio.png) **Note: there are more cases, which i am going cover later on here, this is when we do not have a validator attached and we are just following the chain.** ## How Erigon handles these interactions. Execution Layers are expected to handle these interactions through json rpc, however Erigon and the json rpc (RPC Daemon) are separated so these two components have a quite different way of communicating. All interactions go as follow, Rpc-Daemon receives the json request from the Consensus Layer, then sends it Erigon backend via GRPC, Erigon decodes the GRPC request and sends back an answer, and the the RPC-daemon relays the answer back to the Consenus Layer. Here is a visual 👇👇👇 ![](https://i.imgur.com/nWNQVp4.png) ## Internal Archittecture The way `NewPayload` and `ForkChoiceUpdated` are handled internally in Erigon is by the use of multiple separate components. * Fork validator. * Canonical fork ponter. * Reverse PoS downloader. * Block production engine. ## Fork Validator the fork validator has one simple function, takes a fork and returns whether it is `VALID`, `INVALID`, `ACCEPTED`... these being the return status codes. The fork validator first takes the fork and determines whether it can be validated with the data currently avaiable. If fork validation is not feasible because of a lack of informations or if the fork re-connects to the canonical chain too far behind in the past, then we return `ACCEPTED` and do not perform full validation. The information we need for full validation: * The blocks that reconnect the head fork hash to the canonical chain * All non-canonical block cached within a 32 block range from current canonical head. * the fork head must be 32 blocks away at minimum from the current canonical head. **Note: 32 blocks was chosen because it is the length of an epoch on the beacon chain.** If enough information is present then the fork validator can begin the validation process, by executing the **State transition process** and saving the resulting database diff in memory. ### State Transition process In the PoW times, state transition was pretty simple, we receive the header and process it accordingly 👇👇👇 ![](https://i.imgur.com/kvWDa0k.png) However in PoS, State transition needs to be kept in memory before setting it into the canonical chain (after ForkChoiceUpdated), so it is more complicated. In other words, full state transiton is provoked by `NewPayload` + `ForkchoiceUpdated` rather than the gossip network. This what happen if we first call a newPayload and then an Fcu to its hash immediately after. NewPayload interaction: ![](https://i.imgur.com/3mJF33s.png) ForkchoiceUpdated interaction: ![](https://i.imgur.com/h7Rw7pa.png) So to paraphrase the scenario where we are synced and following the chain tip, this is what is happening: * NewPayload interaction * The beacon chain notifies us of a new Payload (block). * We send the payload to the fork validator. Since, we are on the chain tip and assuming no weird stuff is going on we immediately connect to the canonical chain as the payload parent hash is the head of the canonical chain. * We start the validation process, and we recover sender, execute the block and generate merkle trie data. * Everything is valid, so we take the database diff from the memory execution and we save it for later. * Send VALID to the beacon chain. * ForkChoiceUpdated interaction * The beacon chain tells us that the block we just validated is now the head of the canonical chain. * Do we have the hash of the payload anywhere? yes, because we received just a moment ago. * Is the block hash in the past? no, it is a new block which extends the canonical chain. * Is the block a side fork? no, since it is a natural extension of the canonical chain. * Let's then ask the fork validator if it has a database diff for that specific block and yes it has. * Apply the diff to the canonical chain and to the database. * Process extra stages and send INVALID if the head was not set at the end of the cycle and VALID if it was set. ### State transition with Side forks Okay, when we talk about a chain in PoW, you would think about something like this: ![](https://i.imgur.com/pLxrc14.png) That looks simple, right, well in PoS that is not the case and we can have side forks, which are not to be confused with uncles, these forks can have transactions and can be chosen as the canonical chain any moment. Here is an example: ![](https://i.imgur.com/5anUUpk.png) Yes, this is indeed a valid PoS chain, and yes we need to be able to: * Access all of those forks at any moment in time. * Build on top of all of those forks when required to. * Revert back and forth to any of those forks when required. * Set the canonical head to any of those forks, multiple times in a row if necessary. All of this is handled by the Fork Validator within Erigon. What happens under the hood is that each new block supplied for validation, will be reconnected at some point in the canonical chain. Then in-memory, we are going to reorg to the reconnection point and execute all the blocks beetwen the canonical chain reconnection point and the block supplied, after such simulation is performed, we execute the supplied block and check whether the block state results are valid or not. Here is an example referencing the above flow chart. * The chain is at block N * We extend the canonical chain until N+3 and declare it as the canonical head. * We then receive a new payload for side fork block 1. * The side fork validator simulates its execution by reorging back to block N, and then executing side fork 1. * side fork 1 results to be valid and the simulation is discarded. * We receive side fork block 2, we reorg to block N, execute side fork block 1 and then side fork block 2 on top of it. * We go on like this until side fork 4. * we set Forkchoice to side fork 4 and now side fork 4 is the new canonical chain while N+3 is now the head of a side fork. **Note: If we lack side fork segments, we will try to download the segments and if we are fast enough, we validate them on the spot like they were multiple NewPayload. if we cannot do it in enough time, we either return SYNCING or ACCEPTED.** The way the fork validator act to ensure this behaviour is followed as this: * Keep a cache of recent blocks(canonical and/or inherent to side forks) * Every time a block is marked as valid save it in the cache of recent blocks. * If a block reconnects in the past gather all the blocks in beetwen of the canonical chain and the block from the cache. And then reorg, and build the side fork chain with the blocks gathered. * Clean cached blocks when they are too far in the past. (E.G erigon does it for 32 blocks upwards and downwards from the head). * If any information is missing, return ACCEPTED. * If we cannot find the parent of the block, dont return ACCEPTED, try to download it from the P2P network. return SYNCING instead. #### When should we validate side fork * If we finish the header download process and it is close enough to the head of the chain. * If we are requested to start building a side fork. * If we start the download and process it quickly enough that we can just perform validation (In erigon within 100ms). * Side fork handling will happen a lot of time when Consensus layer is going through initial-sync, especially with Nimbus and Teku. **Note: the whole side fork process can be just replaced by returning ACCEPTED per the specs. but if we do, the Consensus Layer will go in optimistic sync and the execution layer will stall until the epoch is finalized (8 minutes usually).** Here is a flow chart of its functioning 👇👇👇 ![](https://i.imgur.com/Yc0vjyU.png) ## Canonical Fork pointer The canonical fork must be the one stored in the on-disk database and not in-memory. And we should have a pointer pointing on it with all of the canonical markers behind. The pointer is moved through forkchoice updated and when moved it must update the main database. **IMPORTANT: the canonical fork pointer also determines which block is "latest" within the rpcdaemon.** In erigon the pointer is defined as three extra database entries for the fields set to the ForkChoiceUpdated inputs: * SafeBlockHash * HeadBlockHash * FinalizedBlockHash ## Reverse PoS header downloader In case, we receive a block from the beacon chain of which we do not have the parent, we want to download it. since we have access to the parent hash, we can just download block backwards just by checking the hash with the parenthash of the block downloaded before. Hence, validating and downloading backwards. **In the PoS world during the downloading process, we still want to validate incoming request from the Consensus layer, so the downloader component works in parallel with the main stageloop.** ![](https://i.imgur.com/umNL6Gc.png) So in other words lets assume, we have chain: of block hashes 0x1 -> 0x2 -> 0x3 -> 0x4 -> 0x5. We dont have any of these blocks but the Consensus Layer sends us 0x5, with which we start the downloading process. so we receive 0x5, we save it. we request its parent 0x4. we get 0x4, we save it and we request its parent 0x3, until we reconnect to the canonical chain, after which we flush them to database **BUT DO NOT MARK THEM AS CANONICAL AS THAT ONLY HAPPENS PER FORKCHOICEUPDATED**. ## Block production engine. Erigon is also responsible to produce blocks if ran as a validator. The proposing of a block is triggered by an additional parameter in ForkChoiceUpdated which will instruct the Execution Layer to build a block on the given fork. Once the block has been built, an id is assigned to it. and we give the id to the beacon client, which then can query the block with `GetPayload` using the id. The block production must happen as a parallel process and if we cannot finish to produce the block in time, then we return a block with 0 transactions. So here is an example: * CL sends Erigon a ForkChoiceUpdated where it specifies we need to build a block. * Erigon process the ForkchoiceUpdated, initializes the block production process and sends back an id. * Erigon in parallel builds the block using the mining stages and the memory mutation to not stall the stageloop. * CL sends a GetPayload with an id provided before, and erigon provides the block produced. ### Extra Notes * Erigon must keep the recents payloads built. * Erigon needs to switch to side forks flawlessly and build on any of them. * Erigon must never default on a payload id unless it was not set in the first place (rather send a block with 0 tx). ### Flowchart of the block production engine Below is the block production flowchart for the block production engine 👇👇👇 ![](https://i.imgur.com/6vmJzy9.png) And for when the beacon client send us a `GetPayload` request 👇👇👇 ![](https://i.imgur.com/PA06GHQ.png) For more Informations check the [specs](https://github.com/ethereum/execution-apis/blob/main/src/engine/specification.md).