owned this note
owned this note
Published
Linked with GitHub
# Light Client Sync
An alternative to optimistic sync proposed by Paul Hauner is to use a light client to track the chain head, notifying both the EL and CL of updates to allow them to backfill blocks, until the EL a suitably recent world state available and the beacon node can then fully start and use the EL to fully validate all blocks. The beacon node would never need to optimistically import a block.
## Key Advantages
Light client sync has a number of advantages, in general and when compared to optimistic sync. For details of the concerns about optimistic sync see [Being Pessimistic About Optimistic Sync](https://hackmd.io/HZnVYV-RQ4iv21VoJVTqhg).
* Guarantees that light clients are well supported. All nodes on the networks will depend on light clients being able to access data quickly and easily.
* Helps provide additional verification for checkpoint states
* The beacon node operation is largely unchanged.
* All imported blocks are always fully valid.
* When the light client is active, the beacon node just considers itself syncing in the same way it would when syncing today.
* Minimal changes required to the EE API and EL behaviour.
## Key Disadvantages
* Light clients are fairly new and most clients don't yet have one. No clients have completed optimistic sync implementation, but they have started work on them. It will likely take longer to get the required development and infrastructure in place for light clients to work well and this would make that required for the merge, rather than a parallel effort.
* The details of how light clients retrieve updates and data are still being developed and standardised. However this is also true for optimistic sync.
## Unexplored Problems
As Vitalik points out, there are more security assumptions with light client sync, such as relying on smaller committees than the whole validator set. Also, there is a defacto 1-day weak subjectivity assumption, since sync committees update daily. So if beacon chain not finalised within a day there could be an issue.
Significantly more consideration needs to be done around the security assumptions of light clients and their suitability for this use of syncing to make light client sync a production ready proposal.
## Initial sync flow
*Note: light client sync process is assumed to use the process outlined in the [Altair spec](https://github.com/ethereum/consensus-specs/blob/dev/specs/altair/sync-protocol.md).*
The user would need to supply a starting point for the light client - this could be a full `BeaconState` as is currently used for checkpoint sync or ideally could be just a known block root such as a wss checkpoint to allow it to retrieve the initial `LightClientSnapshot`.
There are then three components involved in the sync process - the EL client as today and the CL client essentially has two components, the beacon node (BN) and the light client (LC). The CL components likely reside in the same process and share the networking layer though this is not required.
### Light Client
The light client then connects to the network to track broadcasted `LightClientUpdate` messages. It's assumed that these will be available via libp2p gossip in some form (work is still ongoing to define exactly how I believe).
As the light client's view of the chain progresses, it would send `forkChoiceUpdated` messages to the EL and similar messages to the BN reporting the block root for the head and finalized blocks.
Note that this implies that the light client has a way to retrieve the execution payload `block_hash` value for the head and finalized blocks. This is available in the `BeaconState` so would need to be available for light clients to be at all useful regardless.
#### Light Client Networking Details
The light client would discover and connect to libp2p peers, however it would only subscribe to the light client related gossip topics and its `STATUS` message would report it at the genesis state. It would thus not be able to send any blocks requested via rpc.
### Beacon Node
As the beacon node receives head updates from the light client it can begin backfilling blocks from the network. It would not at this stage process any blocks as the EL isn't able to fully validate them. Downloading blocks at this point is optional, but would enable the beacon node to more quickly begin fully participating in the network.
One complexity here is that the block root does not cover the signature so the beacon node would need access to the validator public keys in order to check the block signatures. It could access these from an initial `BeaconState` if provided by the user or by requesting them from the light client.
### Execution Layer Client
The EL continues to function as it does today, except it is initially receiving `forkChoiceUpdated` messages from the light client.
## Transitioning out of initial sync
At some point the EL should complete sync and be ready to execute payloads. With the current EE API, the `forkChoiceUpdated` message will always result in a `SYNCING` response from the EL because it isn't yet receiving `executePayload` responses so at best would have the parent block, but won't have the new chain head yet. So either the `forkChoiceUpdated` API would need to be modified or the light client or BN would need to periodically attempt to call `executePayload` with a payload close to or at the head.
Whichever approach is used, once the EL is able to validate blocks, the light client can stop and notify the beacon node that it is now in control.
The beacon node would then be able to fully validate blocks and could sync. In a naive implementation, the beacon node would begin executing blocks from the initial state the user originally provided to reach the chain head reported by the light client. If the EL didn't take long to sync this is probably a reasonable number of blocks but if downloading the world state was slow this could be quite a few blocks.
A potential optimisation here would be to download a more recent `BeaconState` to start executing blocks from. This can be done trustlessly because the light client has provided the block root to use. The existing `/eth/v2/debug/beacon/states/{state_id}` endpoint could be used to access this state. If it's unavailable the BN can fall back to processing blocks from whatever initial state it does have.
To avoid the EL needing to sync further, the BN should call `executePayload` for every block after the first block confirmed to be known by the EL (that started the transition out of initial sync).
## Handling the merge transition
If the CL is started with an initial state that has not yet completed the merge, it would startup as it does today and not activate the light client.
If the CL is started with an initial state that has completed the merge, it should issue a `forkChoiceUpdated` request to the EL. If the EL returns `SUCCESS` it can startup and begin processing blocks normally, otherwise it should activate the light client.
The final case is if the CL starts up pre-merge but when the merge block is processed, the EL returns `SYNCING` from the `executePayload` call. This is equivalent would follow the process in [after the initial sync](#After-the-initial-sync).
## Validating block prior to world state download
The EL would have back filled blocks so any block that's an ancestor of the available world state would be already known and the EL can immediately return valid. The CL should only be syncing blocks that are ancestors of the head determined by the light client so all blocks would be known to be valid.
Potentially the CL could execute the payload of the block the light client reported as chain head and once that's confirmed as valid it can skip sending any execution payloads from ancestors to the EL while processing those blocks to update it's `BeaconState`.
## After the initial sync
If at some point after the initial sync completes the EL returns `SYNCING` the beacon node needs to decide when it should cease its operations and switch back to the light client mode.
If the EL will only be syncing for a short period, the BN should continue running but won't be able to import new blocks. The BN would need to periodically retry executing the payload that caused the `SYNCING` response while later blocks will build up in the pending pool.
If the EL will be syncing for a long period, or the the expected short sync winds up taking too long, the BN will need to reactivate the light client. It should unsubscribe from block, attestation and sync commmitee gossip topics. It should continue reporting the same `STATUS` and is able to serve requested blocks up to what it has imported.
The light client should take its initial `LightClientSnapshot` from the beacon node's state, subscribe to light client updates and resume processing them. The light client would provide `forkChoiceUpdated` events to the EL and BN in the same way as during initial sync.
Exiting this sync phase would be almost identical to exiting initial sync except that the BN would likely process all blocks beginning from where it was up to until it reaches the new chain head. It would be theoretically possible for the BN to download a new `BeaconState` and resume syncing from there, though this would likely require significant changes to most existing clients to support.
### Choosing between short and long sync
It is currently difficult for the BN to know if the EL will take a long time to sync or not and whether the EL requires ongoing `forkChoiceUpdated` calls in order to perform that sync.
My understanding however is that the EL only needs on-going `forkChoiceUpdated` notifications to facilitate it download the world state. If it already has a suitable world state, it can follow the chain backwards from the unknown block to the world state it has and then execute forwards.
If the `SYNCING` response included information from the EL about whether it needed on-going head updates, this would make it simple for the BN to decide when to switch back to the light client. The EL would be allowed to initially respond `SYNCING` and indicate it didn't need on-going updates and then later indicate it did need them. This may happen if the EL has a world state it believes can be used but then downloads the ancestors of the required block and discovers they don't descend from that world state. The EL would then need to download a different world state and would require on-going head updates to facilitate that.
Notably this distinction would also allow the BN to continue importing blocks from other forks while the EL is performing a "short sync". This neatly resolves the issue where the merge block includes an execution payload with a parent that hasn't actually been published. The EL would return `SYNCING` when that payload is executed, but would assume it could just use its existing world state. The BN would then continue processing blocks on other forks, potentially importing a merge block where the execution payload is available.
The BN would also not immediately enter its sync mode when the EL indicates it is missing parent blocks, but would only do so if it fell far enough behind its peers (as is the case today) or if it needed to switch to the light client. Thus the BN would naturally continue producing blocks and avoid having the chain stall.