owned this note
owned this note
Published
Linked with GitHub
# Optimistic sync edge cases with the new Engine API semantics
This document makes an analysis of a few edge cases taking in account [new Engine API semantics](https://github.com/ethereum/execution-apis/pull/165) and [corresponding changes](https://github.com/ethereum/consensus-specs/pull/2820) to the optimistic sync spec.
In this document we distinguish two types of EL clients with respect to a number of state copies kept in the storage:
* **Deep-state clients.** Keep up to `N` most recent states in the storage. These clients are capable of validating payloads on multiple forks at the same time because a requisite parent state is usually available. Therefore, this type of clients may instantly validate a payload when a payload belongs to a side branch, and parent state is available.
We assume a parent state on a side branch is available if CL is lock-stepping and a payload's `blockNumber` is within a range `[HEAD-N; HEAD]`.
If the parent state is unavailable, we assume that this type of clients responds with `ACCEPTED` status to a `newPayload` call. A subsequent `forkchoiceUpdated` call signifying previously `ACCEPTED` payload as canonical one will trigger payload validation which, in its turn, induces a sync process if needed.
* **Shallow-state clients.** Maintain only the most recent version of a state, i.e. a post-state of the head of the canonical chain. Every `newPayload` call submitting a payload that extends a side branch of the block tree is responded with `ACCEPTED` status. Validation of paylods from side branches happens only when they become canonical, i.e. the corresponding `forkchoiceUpdated` method call is received.
Usually, this type of clients is capable of doing short re-orgs in a reasonable amount of time. We identify three possible scenarios of a re-org handling in a shallow-state client:
* *Short range re-org.* Involves a few blocks and happens, say, under a sub-second time on a commodity hardware. In this scenario client software may *synchronously* i) revert state to a common ancestor and ii) execute a few blocks on top of it to reach the head of the chain that is designated by `forkchoiceUpdated`. The result of this method call will be either `VALID` or `INVALID` instantly informing CL of validity status of the payloads
* *Long range re-org.* Invloves decent number of blocks and can't be done in a sub-second interval. In this scenario client software starts executing previosuly cached payloads *asynchronously* and responds with `SYNCING` status to the corresponding `forkchoiceUpdated` call, and engages CL to check status of these payloads later on
* *Unknown range re-org.* This scenario hits when a client software is missing ancestors of a payload in question. In this case a sync process to pull required information from the network is started with `SYNCING` status in the response to the `forkchoiceUpdated` method call. CL is engaged to check the status later on as in the previous scenario.
Note that shallow-state client implementation may not support *synchronous* short range re-orgs at all and do all types of re-orgs in an *asynchronous* fashion.
From CL standpoint shallow-state clients that are supporting synchronous short-range re-orgs and deep-state clients has an important difference. Let's illustrate this difference on the following example:
```
A <- B <- C (a)
\
<- B' <- C' (b)
```
CL is lock-stepping, (a) is the canonical branch at the beginning, and the network re-orgs to (b) after block `C'` is proposed and voted for.
Behaviour of a node running different type of EL client looks as follows:
* With a shallow-state client supporting synchronous short-range re-orgs a node applies blocks `B'` and `C'` to its fork choice state *optimistically*. EL validates payloads of `B'` and `C'` when processing the `forkchoiceUpdated(head: C')` call and syncrhonously responds with `VALID` to this call.
* With a deep-state client a node applies blocks `B'` and `C'` to its fork choice state *non-optimistically* as EL validates payloads of these blocks on corresponding `newPayload` calls. EL switches the head to already validated block `C'` while processing the `forkchoiceUpdated(head: C')` call.
In both cases a node re-orgs to (b) without turning optimistic mode, and VC served by this node keeps doing its duties seamlessly. The difference in this case, namely, applying a block to the fork choice state in an *optimistic* or *non-optimistic* way, doesn't matter that much after the Merge transition. But it matters a lot during the transition according to further analysis.
## Optimistic sync
The optimistic sync spec gives the following definition to optimistically imported blocks:
> blocks which have only received a `NOT_VALIDATED` designation from an execution engine (i.e., they are not known to be `INVALIDATED` or `VALID`).
The spec allows for importing a block optimistically only if either of the following conditions are met:
> 1. The justified checkpoint has execution enabled.
> 2. The current slot (as per the system clock) is at least `SAFE_SLOTS_TO_IMPORT_OPTIMISTICALLY` ahead of the slot of the block being imported.
If the first condition is met then we have transition block justified and for simplicity treat scenarios related to this case as a post-transition kind of scenarios. Though, post-transition scenarios requires transition block to be finalized, this simplification doesn't affect further analysis.
Scenarios, when *only* the second condition or neither of these two conditions are met, belong to transition kind of scenarios.
The Optimistic Spec has the following adverse effect if a node uses shallow-state client. *A node running a shallow-state EL client can't import a block from a side chain until either of optimistic block import conditions are met.*
Simply, CL have to import every side chain block optimistically if it communicates with a shallow-state EL client. But it's forbidden by the Optimistic Sync spec if neither of the conditions above are met.
## Transition edge cases
Let's remind us a property of a terminal block that is important to analysis of this edge cases. There may be several terminal blocks, in this case these terminal blocks will likely be children of the same pre-terminal block, but it also could be a more complicated case e.g. with a couple pre-terminal blocks and one grand-terminal block and so forth.
Read Security Considerations section of the [EIP-3675](https://eips.ethereum.org/EIPS/eip-3675#ability-to-jump-between-terminal-pow-blocks) to get more explanation on the importance of re-orgs from one terminal block to another.
### Canonical transition block is missing data
A transition block that extends the canonical chain contains a payload built atop of unknown or not yet executed terminal block.
In this scenario both type of clients will respond with `ACCEPTED` status to the corresponding `newPayload` call and CL won't be able to import this block until `SAFE_SLOTS_TO_IMPORT_OPTIMISTICALLY` runs out or transition is justified.
### Transition payload is from a side chain
There are two potential ways to end up in this scenario:
* A payload of proposed transition block is built on terminal block `B` while terminal block `A` is the head of the canonical chain to an observation of a local EL client.
* A payload of proposed transition block `A` is built on the canonical head to an observation of a local EL client. And then another transition block `B` gets proposed with a payload built on a tip of a side branch, and the network favours `B` over `A`.
We assume that in both of these cases ancestors of terminal blocks `A` and `B` are known to local EL client, and a short-range re-org is enough to jump between these blocks.
Deep-state clients will seamlessly re-org to `B` in this case. While shallow-state clients will not import `B` and stay at `A`'s chain until `SAFE_SLOTS_TO_IMPORT_OPTIMISTICALLY` runs out or `A`'s chain is justified.
If `>= 2/3` of validators are running deep-state clients then the transition will be justified and by importing block `B` optimistically shallow-state clients will catch up with the canonical chain and pass the transition after a short period of time stick to a side chain resulting in validator penalties.
If the majority of validators are running shallow-state clients the transition becomes highly dependent on the network conditions. In the worst case scenario there will be a chain split which resolves in the next `SAFE_SLOTS_TO_IMPORT_OPTIMISTICALLY` after nodes will import side chain blocks optimistically and converge at the same canonical chain eventually.
Note that in this case mechanism preventing the [Fork Choice Poisoning](https://github.com/ethereum/consensus-specs/blob/dev/sync/optimistic.md#fork-choice-poisoning) does still work as a split at a transition block won't affect justification of a checkpoint existing before the split. But one another assumption must be made in order to make it work:
> The chain must provide enough block space supply to accomodate attestations made on other forks to be able to justify an epoch in question in `SAFE_SLOTS_TO_IMPORT_OPTIMISTICALLY`.
## Post-transition edge cases
### Canonical payload is missing data
If a payload extending the canonical chain can't be validated due to requisite data is missing then both types of client will respond with `SYNCING` status to the `newPayload` calls making a node *optimistic*.
A CL client is engaged to periodically check for updates on the status until syncing process is finished. The default strategy for these checks is simply submitting `newPayload` and `forkchoiceUpdated` when subsequent payloads are received from the network and looking at the status in the responses.
### Re-org to a side chain
The case when a payload or a several payloads in a row came from a side chain and then re-org is happening to this chain.
Shallow-state clients will respond with `ACCEPTED` status to every `newPayload` call submitting payloads from a side branch. CL client will switch a node to optimistic mode by making a re-org. But next steps depend on EL client implementation and re-org depth:
* In case of a long or unknown range re-org shallow-state client responds with `SYNCING` to the `forkchoiceUpdated` method call and leaves a node in optimistic mode with necessity to check for EL status updates
* If a shallow-state client supports short-range re-orgs, i.e. doing them synchronously, then it will respond with `VALID/INVALID` status to the corresponding `forkchoiceUpdated` call, thus, a node will turn off an optimistic mode upon receiving the response to this call and VC served by a node will be able to continue doing its duties. There is a very little time of a node to be in an optimistic mode in this case. Literally the period between CL changes its head to a side chain block and receives a response to the corresponding `forkchoiceUpdated` call
* If short-range re-orgs aren't supported then EL responds with `SYNCING` status as in case of long-range re-org leaving a node in an optimistic mode with inability to serve VC until the branch is validated. As an optimisation of handling this case, CL may poll a status update from EL on a sub-slot time interval by calling `forkchoiceUpdated` with the same data.
Deep-state clients will respond with `VALID/INVALID` status to `newPayload` call if a parent state of a payload exists locally (payload `blockNumber` is in `[HEAD-N; N]` range), thus, making a node to re-org avoiding optimistic mode.
In case of long-range (when `blockNumber` is lower than `HEAD-N` and paren state is missing) or unknown-rage (when ancestor blocks are missing) re-orgs deep-state client responds with `SYNCING` and CL will have to check for updates of EL status, and once EL responds with `VALID/INVALID` turn off optimistic mode.