Light Client Sync

An alternative to optimistic sync proposed by Paul Hauner is to use a light client to track the chain head, notifying both the EL and CL of updates to allow them to backfill blocks, until the EL a suitably recent world state available and the beacon node can then fully start and use the EL to fully validate all blocks. The beacon node would never need to optimistically import a block.

Key Advantages

Light client sync has a number of advantages, in general and when compared to optimistic sync. For details of the concerns about optimistic sync see Being Pessimistic About Optimistic Sync.

  • Guarantees that light clients are well supported. All nodes on the networks will depend on light clients being able to access data quickly and easily.
  • Helps provide additional verification for checkpoint states
  • The beacon node operation is largely unchanged.
    • All imported blocks are always fully valid.
    • When the light client is active, the beacon node just considers itself syncing in the same way it would when syncing today.
  • Minimal changes required to the EE API and EL behaviour.

Key Disadvantages

  • Light clients are fairly new and most clients don't yet have one. No clients have completed optimistic sync implementation, but they have started work on them. It will likely take longer to get the required development and infrastructure in place for light clients to work well and this would make that required for the merge, rather than a parallel effort.
  • The details of how light clients retrieve updates and data are still being developed and standardised. However this is also true for optimistic sync.

Unexplored Problems

As Vitalik points out, there are more security assumptions with light client sync, such as relying on smaller committees than the whole validator set. Also, there is a defacto 1-day weak subjectivity assumption, since sync committees update daily. So if beacon chain not finalised within a day there could be an issue.

Significantly more consideration needs to be done around the security assumptions of light clients and their suitability for this use of syncing to make light client sync a production ready proposal.

Initial sync flow

Note: light client sync process is assumed to use the process outlined in the Altair spec.

The user would need to supply a starting point for the light client - this could be a full BeaconState as is currently used for checkpoint sync or ideally could be just a known block root such as a wss checkpoint to allow it to retrieve the initial LightClientSnapshot.

There are then three components involved in the sync process - the EL client as today and the CL client essentially has two components, the beacon node (BN) and the light client (LC). The CL components likely reside in the same process and share the networking layer though this is not required.

Light Client

The light client then connects to the network to track broadcasted LightClientUpdate messages. It's assumed that these will be available via libp2p gossip in some form (work is still ongoing to define exactly how I believe).

As the light client's view of the chain progresses, it would send forkChoiceUpdated messages to the EL and similar messages to the BN reporting the block root for the head and finalized blocks.

Note that this implies that the light client has a way to retrieve the execution payload block_hash value for the head and finalized blocks. This is available in the BeaconState so would need to be available for light clients to be at all useful regardless.

Light Client Networking Details

The light client would discover and connect to libp2p peers, however it would only subscribe to the light client related gossip topics and its STATUS message would report it at the genesis state. It would thus not be able to send any blocks requested via rpc.

Beacon Node

As the beacon node receives head updates from the light client it can begin backfilling blocks from the network. It would not at this stage process any blocks as the EL isn't able to fully validate them. Downloading blocks at this point is optional, but would enable the beacon node to more quickly begin fully participating in the network.

One complexity here is that the block root does not cover the signature so the beacon node would need access to the validator public keys in order to check the block signatures. It could access these from an initial BeaconState if provided by the user or by requesting them from the light client.

Execution Layer Client

The EL continues to function as it does today, except it is initially receiving forkChoiceUpdated messages from the light client.

Transitioning out of initial sync

At some point the EL should complete sync and be ready to execute payloads. With the current EE API, the forkChoiceUpdated message will always result in a SYNCING response from the EL because it isn't yet receiving executePayload responses so at best would have the parent block, but won't have the new chain head yet. So either the forkChoiceUpdated API would need to be modified or the light client or BN would need to periodically attempt to call executePayload with a payload close to or at the head.

Whichever approach is used, once the EL is able to validate blocks, the light client can stop and notify the beacon node that it is now in control.

The beacon node would then be able to fully validate blocks and could sync. In a naive implementation, the beacon node would begin executing blocks from the initial state the user originally provided to reach the chain head reported by the light client. If the EL didn't take long to sync this is probably a reasonable number of blocks but if downloading the world state was slow this could be quite a few blocks.

A potential optimisation here would be to download a more recent BeaconState to start executing blocks from. This can be done trustlessly because the light client has provided the block root to use. The existing /eth/v2/debug/beacon/states/{state_id} endpoint could be used to access this state. If it's unavailable the BN can fall back to processing blocks from whatever initial state it does have.

To avoid the EL needing to sync further, the BN should call executePayload for every block after the first block confirmed to be known by the EL (that started the transition out of initial sync).

Handling the merge transition

If the CL is started with an initial state that has not yet completed the merge, it would startup as it does today and not activate the light client.

If the CL is started with an initial state that has completed the merge, it should issue a forkChoiceUpdated request to the EL. If the EL returns SUCCESS it can startup and begin processing blocks normally, otherwise it should activate the light client.

The final case is if the CL starts up pre-merge but when the merge block is processed, the EL returns SYNCING from the executePayload call. This is equivalent would follow the process in after the initial sync.

Validating block prior to world state download

The EL would have back filled blocks so any block that's an ancestor of the available world state would be already known and the EL can immediately return valid. The CL should only be syncing blocks that are ancestors of the head determined by the light client so all blocks would be known to be valid.

Potentially the CL could execute the payload of the block the light client reported as chain head and once that's confirmed as valid it can skip sending any execution payloads from ancestors to the EL while processing those blocks to update it's BeaconState.

After the initial sync

If at some point after the initial sync completes the EL returns SYNCING the beacon node needs to decide when it should cease its operations and switch back to the light client mode.

If the EL will only be syncing for a short period, the BN should continue running but won't be able to import new blocks. The BN would need to periodically retry executing the payload that caused the SYNCING response while later blocks will build up in the pending pool.

If the EL will be syncing for a long period, or the the expected short sync winds up taking too long, the BN will need to reactivate the light client. It should unsubscribe from block, attestation and sync commmitee gossip topics. It should continue reporting the same STATUS and is able to serve requested blocks up to what it has imported.

The light client should take its initial LightClientSnapshot from the beacon node's state, subscribe to light client updates and resume processing them. The light client would provide forkChoiceUpdated events to the EL and BN in the same way as during initial sync.

Exiting this sync phase would be almost identical to exiting initial sync except that the BN would likely process all blocks beginning from where it was up to until it reaches the new chain head. It would be theoretically possible for the BN to download a new BeaconState and resume syncing from there, though this would likely require significant changes to most existing clients to support.

Choosing between short and long sync

It is currently difficult for the BN to know if the EL will take a long time to sync or not and whether the EL requires ongoing forkChoiceUpdated calls in order to perform that sync.

My understanding however is that the EL only needs on-going forkChoiceUpdated notifications to facilitate it download the world state. If it already has a suitable world state, it can follow the chain backwards from the unknown block to the world state it has and then execute forwards.

If the SYNCING response included information from the EL about whether it needed on-going head updates, this would make it simple for the BN to decide when to switch back to the light client. The EL would be allowed to initially respond SYNCING and indicate it didn't need on-going updates and then later indicate it did need them. This may happen if the EL has a world state it believes can be used but then downloads the ancestors of the required block and discovers they don't descend from that world state. The EL would then need to download a different world state and would require on-going head updates to facilitate that.

Notably this distinction would also allow the BN to continue importing blocks from other forks while the EL is performing a "short sync". This neatly resolves the issue where the merge block includes an execution payload with a parent that hasn't actually been published. The EL would return SYNCING when that payload is executed, but would assume it could just use its existing world state. The BN would then continue processing blocks on other forks, potentially importing a merge block where the execution payload is available.

The BN would also not immediately enter its sync mode when the EL indicates it is missing parent blocks, but would only do so if it fell far enough behind its peers (as is the case today) or if it needed to switch to the light client. Thus the BN would naturally continue producing blocks and avoid having the chain stall.

Select a repo