Being Pessimistic About Optimistic Sync

Overall Concerns

The biggest risk with optimistic sync is that it's a complicated mechanism that requires large parts of the beacon node and in some cases even the validator client to be aware of the details of optimistic sync state. This level of complexity makes on-going maintenance a challenge and increases the cost of any future development work. It also significantly increases the likelihood of bugs and those bugs are likely to have significant security impacts for at least the local node and potentially the chain as a whole.

Gossip Subscriptions

In order for the EL to sync, it needs to be kept up to date with the chain head. Thus, the beacon node must subscribe to gossip in order to receive blocks and attestations so it can follow the correct chain head.

However, the beacon node must not forward gossip for any block or attestation that refers to something that has only been optimistically sync'd. In practice, this means the beacon node will effectively censor all gossip. Currently, this would result in the node being downscored. To support optimistic sync we'd have to allow nodes that censored all gossip, reducing the effectiveness and safety of the gossip network.

Complexity of when to perform duties

There is a significant amount of complexity and uncertainty about when the beacon node should allow attached validator clients to perform duties and when not to. Initially it was thought the node should not perform any duties while it's optimistic head was different to it's strict head, but this potentially leads to a liveness failure on the chain if the TTD block is not published.

Solutions to this have been proposed (though I'm not entirely sure I can list the exact conditions) but it means a bunch of special case handling where behaviour differs based on whether the merge block is finalized, how far ahead optimistic sync is etc. That complexity will be hard to remove even after the merge.

Pervasiveness of changes

Optimistic sync isn't just complex, it's also pervasive - affecting many different parts of the beacon node and validator client.

The inital write up from Paul Hauner includes a set of scenarios for how different components are to operate with optimistic sync showing it touches:

Block/Attestation/Sync committee production
REST API
P2P networking

As previously mentioned there is also an impact on gossip handling and the sync component is also obviously affected.
The actual implementation goes into fork choice and the core state transition logic needs to handle SYNCING responses from the EL.
Storage also needs to be updated to track which blocks are fully validated and which are only optimistically validated (affecting both finalized and non-finalized storage).

Even where these components can just use the verified ancestor head and potentially not need code changes, there is often different handling required when that verified ancestor head is finalized and having a finalized chain head is a completely new state for beacon nodes to handle.

Beyond code, each of these touch points needs to be carefully evaluated for any security implications that may come from using the optimisic head, or from using a potentially very old chain head.

Tracking verified ancestor head isn't enough

While the initial write up of optimistic sync discusses which head to use (optimistic or verified ancestor), that by itself isn't necessarily enough. For example the REST API allows retrieving blocks by root - should that API include only fully verified blocks or optimistic blocks as well?

There are many places in the beacon node which checks a block is known and now each of those needs to be evaluated to determine if optimistically imported is enough or if the block is only considered known when it is fully validated.

Doesn't just affect initial sync

We often think of optimistic sync as only affecting the initial sync process, but the EL may return SYNCING at any point, causing the beacon node to have to fall back into optimistic sync mode. So any risk points from optimistic sync can't be mitigated by just verifying that the node is on the chain once the sync completes.

Some reasons the EL may return to SYNCING include the user deleting it's database (to recover from data corruption), user switching to a different EL.

Doesn't go away

If optimistic sync was something we needed to get through the merge, we could very carefully get the details right once and then remove it again. But it's actually a permanently required feature so every future change to beacon node clients needs to carefully consider the implications of optimistic sync and deal with that extra complexity. That adds a significant amount of technical debt which will slow down further enhancements.

Eventually the extra complexity will inevitably lead to a bug being introduced in clients because optimistic sync wasn't correctly handled. Given optimistic sync mostly involves the selection of the canonical chain, there's a high likelihood of such a bug causing a chain split, security vulnerability or causing the node to perform validator duties in a way that's unsafe for either the individual validator or the network as a whole.

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`	在筆記中貼入程式碼
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.