# Sync from arbitrary state
###### tags: `weak subjectivity`, `sync`
**Author(s):** Victor Farazdagi (Prysmatic Labs)
*Last Updated: Apr 19, 2021*
[TOC]
## 0. Overview
This document describes steps necessary to implement syncing from an arbitrary initial state. The primary aim is to allow starting beacon node from any given state, as if it was a genesis state.
Once design outlined in this document is implemented, we can make sure that such a sync works with `--weak-subjectivity-checkpoint block_root:epoch_number` flag, i.e. syncing is only allowed if, in addition to the initial state, weak subjectivity checkpoint is provided (and state is within weak subjectivity period). That will allow to sync from an arbitrary state, and not worry about long-range attacks.
## 1. Distributing initial state
We need methods to specify initial state when running a beacon node. Ideally, we should support several of such methods. Here is the list (in priority order) of possible initial state distribution methods:
1. Load state from a compressed local or remote file.
1. Embed the state itself right into binaries.
1. Embed URLs of trusted parties into binaries, so that initial state can be fetched when needed.
:::info
**One more way of state distribution (highly experimental)**
We can consider one more way: allow to download state for a given weak subjectivity checkpoint. We can expose `/beacon/ws-checkpoints/:block_root:epoch_number/state` and allow to download state for a given checkpoint.
So, when only `--weak-subjectivity-checkpoint` flag is provided, and state is downloaded from our trusted beacon node. Therefore, user can copy/paste checkpoint info from our GitHub and start node, which will be able to sync from that checkpoint (by first downloading the corresponding state).
This method will certainly NOT be implemented during the first iteration, but is something to think about.
:::
All methods together, provide a very robust way of loading the initial state:
1. It starts with checking if path to a file containing compressed state is specified.
```flow
st=>start: Start node
e=>end: Run node
subCheckEmbeddedState=>subroutine: Check embedded state
opLoadState=>operation: Load state
opIgnoreState=>operation: Ignore state
opDownloadState=>operation: Download state
opSaveState=>operation: Save and use state
condEmptyDB=>condition: Empty DB?
condHasInitState=>condition: Initial state
provided?
condIsLocalFile=>condition: Is local file?
condIsWithinWS=>condition: Within WS period?
condDownloadIsOK=>condition: Downloaded OK?
st->condEmptyDB
condEmptyDB(yes)->condHasInitState
condEmptyDB(no)->opIgnoreState->e
condHasInitState(yes)->condIsLocalFile
condHasInitState(no)->subCheckEmbeddedState
condIsLocalFile(yes)->opLoadState
condIsLocalFile(no)->opDownloadState
opDownloadState(right)->condDownloadIsOK
opLoadState->condIsWithinWS
condDownloadIsOK(yes)->opLoadState
condDownloadIsOK(no)->subCheckEmbeddedState
condIsWithinWS(yes)->opSaveState(right)->e
condIsWithinWS(no)->subCheckEmbeddedState
```
2. If initial state is not specified as path to local file or URL to remote file, or it cannot be used (stale state, failed download etc) node checks whether there exists an embedded state.
```flow
st=>start: Check embedded state
e=>end: Run node
subCheckEmbeddesURLs=>subroutine: Check embedded URLs
opLoadState=>operation: Load state
opIgnoreState=>operation: Ignore state
opDownloadState=>operation: Download state
opSaveState=>operation: Save and use state
condEmbeddedStateProvided=>condition: Has embedded state?
condIsWithinWS=>condition: Within WS period?
st(right)->condEmbeddedStateProvided
condEmbeddedStateProvided(yes)->opLoadState
condEmbeddedStateProvided(no)->subCheckEmbeddesURLs
opLoadState(right)->condIsWithinWS
opIgnoreState->e
condIsWithinWS(yes)->opSaveState(right)->e
condIsWithinWS(no)->subCheckEmbeddesURLs
```
3. Finally, node checks whether it can obtain a state from embedded trusted 3rd party URLs.
```flow
st=>start: Check embedded URLs
e=>end: Run node
opLoadState=>operation: Load state
opIgnoreState=>operation: Ignore state
opDownloadState=>operation: Download state
opSaveState=>operation: Save and use state
condIsWithinWS=>condition: Within WS period?
condHasEmbeddedURLs=>condition: Has non-checked
embedded URLs?
condDownloadIsOK=>condition: Downloaded OK?
st(right)->condHasEmbeddedURLs
condHasEmbeddedURLs(yes, right)->opDownloadState
condHasEmbeddedURLs(no)->opIgnoreState
condIsWithinWS(yes)->opSaveState(left)->e
condIsWithinWS(no)->opIgnoreState
condDownloadIsOK(yes)->opLoadState
condDownloadIsOK(no)->condHasEmbeddedURLs
opDownloadState(right)->condDownloadIsOK
opLoadState->condIsWithinWS
opIgnoreState->e
```
### 1.0 Invariants
Disregarding of how initial state was obtained, there are number of invariants on how it will be processed/used:
#### 1.0.1 Initial state is used for node initialization
The initial state is for node **initialization**, so it will be used only if node is started for the first time, or if node is started with `--clear-db` flag. If database is non-empty, provided initial state is ignored, and warning is emitted.
*Note: this invariant makes sure it is easy to reason about initial states, otherwise if node is restarted with another initial state, we have to decide how to populate several gaps `genesis -> ..gap.. -> state1 -> ..gap.. -> state2`. It is way easier to consider initial state as being a starting point of en empty node, so we are responsible for makeing sure it can sync from that point, and backfill blocks from genesis to that point.*
**Action Items:**
- [ ] Make sure that we do not allow to overwrite state of a non-empty database. Emit warning, if `--initial-state` flag is being used on a non-empty DB.
#### 1.0.2 Protecting from long-range attacks
When providing node with an initial state, the state must be within weak subjectivity period (otherwise we are dealing with a stale state). To ensure security, users are expected to provide the `--weak-subjectivity-checkpoint block_root:epoch_number` param as well. The `IsWithinWeakSubjectivityPeriod()` helper will be used to check if that provided checkpoint is not stale itself, and if it is -- initial state will be ignored.
*Note: the weak subjectivity checkpoint will be useful in any case (stale or not) to make sure that its root is present in our DB, as if it doesn't node must terminate.*
**Action items:**
- [ ] Make sure that `--initial-state=PATH` requires `--weak-subjectivity-checkpoint=block_root:epoch_number` to be provided as well. Consider alternative ways of specifying the weak subjectivity checkpoint (embedded, for example).
- [ ] Rely on `IsWithinWeakSubjectivityPeriod()` to make sure that checkpoint is not stale i.e. our node is too far beyond.
- [ ] Assert that checkpoint and input state have the same root and epoch.
### 1.1 Load state from a compressed local or remote file
This method allows to provide path to compressed state file (either as path to local file or URL to remote).
When users want to provide initial state as the local file, the state must be fetched from a trusted 3rd party, and saved locally. Then, a node will be started with `--initial-state=/path/to/state.ssz.snappy` (or we can use`ssz_snappy` extension, for consistency with how ETH2 specs outputs are generated, see [ethereum/eth2.0-specs/pull/2097](https://github.com/ethereum/eth2.0-specs/pull/2097)) flag.
Passing URL to remote state file, is very similar (`--initial-state=https://raw.githubusercontent.com/prysmaticlabs/prysm/develop/states/finalized.ssz.snappy`), but node itself is responsible for downloading the state.
**Action items:**
- [ ] Add `--initial-state=PATH` flag.
- [ ] Make sure that `--initial-state` accepts URL, and can download, parse and load state from that URL.
- [ ] Add decompressing, parsing and loading into `BeaconState` the provided `*.ssz.snappy` file. This can be in form of a new helper method `LoadInitialStateFromSSZ(compressedState []byte) error`, which will be responsible for parsing and loading the state into DB.
*Note: we can re-use code from how we parse and load embedded genesis state*.
**Expected outcomes:**
- [ ] Make sure that loaded `BeaconState` is ready to be used in node syncing (both regular and init-sync).
### 1.2 Embed the state itself right into binaries
On releases, the latest weak subjectivity checkpoint and its state should be embedded right into binary. Node will be able to sync from that point (quickly), without requirement of users to obtain initial state themselves.
**Action items:**
- [ ] Update node code to use pre-defined embedded state:
```golang
//go:embed initial.ssz.snappy
var initialStateRawSSZCompressed []byte
```
- [ ] Make sure that embedded state is used only if it is present, not stale, and no initial state has been supplied via CLI `--initial-state` argument.
- [ ] Ideally we should have a CLI script that helps us assemble the binary and save the most recent weak subjectivity checkpoint and corresponding state.
- [ ] Consider whether weak subjectivity checkpoint data can be embedded as well.
**Expected outcomes:**
- [ ] On releases, binary will contain updated embedded state, that will be used as initial state on raw databases (when no initial state is provided via CLI arguments).
### 1.3 Embed URLs of trusted parties into binaries
This is basically a follow up on `--initial-state=URL` functionality, but adding more usability and robustness the method.
Consider the following embedded data:
```golang
var embeddedInitialState = downloadableState{
hash: "9bef...0155",
urls: []string{
"https://prysmaticlabs.com/uploads/states/v0.7.1.ssz.snappy",
"https://beaconscan.com/ws_checkpoint/state.ssz",
// some more URLs ...
},
}
```
If initial state is not provided as CLI argument (or it is stale/unavailable for download etc), node looks for embedded state, if it is not available (or stale), node finally goes through the list of providers in an attemt to obtain the state.
**Action items:**
- [ ] Implement and test functionality where node is able to traverse list of embedded URLs and obtains initial state from one of it.
- [ ] Consider using meta-links (e.g. `https://foo.bar/state/latest-finalized.ssz.snappy`, where hash cannot be known in advance, so cannot guarantee file integrity).
**Expected outcomes:**
- [ ] Ability to robustly download state from one of the trusted state providers.
## 2. Syncing
### 2.0 Overview
We assume that chain has started, and obtained beacon state is valid. Ideally, Prysm should be able to proceed from a given state w/o any major refactoring i.e. core components should rely on a given state and be able to progress in absence of previous states (blocks).
### 2.1 Changes required to the existing components
Syncing from a state that doesn't have historical blocks/states between it and genesis, requires updating some of our existing components.
The list of components below is incomplete, as we proceed with implementation there certainly will occur more packages that assume presence of historical blocks/states.
#### 2.1.1 `sync` and `initial-sync` packages
- [ ] Either obtaining block for the given state, or make sure that block processing works for successive slots w/o requirement of parent's block existence in DB. At the moment, we will get "parent block not found in DB", when trying to build on top of the state.
- [ ] Ability to restart a node that has been started with initial state. For this, the provided initial state must be saved in DB (or some meta-data, at least), and on restart, while it will not be re-saved (as DB is already not empty), system should know that the node has started out of arbitrary state and some extra operations are required (like back-filling previous blocks, for instance).
*Note: we can implement this by storing `InitalStateCheckpoint` in database -- whenever it exists we know the root and slot of the provided state*
- [ ] When initial state is provided, in addition to saving it, we need to make sure that head block is also updated. Again, it means we either obtain block for a given state, or update our code in such a way that block_root is enough when head block is expected.
- [ ] Similar things for justified and finalized checkpoints: our provided state is considered finalized, so when saving it, we need to make sure that checkpoints are updated accordingly.
#### 2.1.2 `stategen` package
- [ ] Double check that block of base state is not required when regenerating states. That's an arbitrary state should be enough to apply more blocks on top of that base state.
#### 2.1.3 `rpc` package
- [ ] Make sure that both `BeaconBlocksByRange` and `BeaconBlocksByRoot` work correctly when queried for non-existent historical blocks (that's close to head, we have blocks, but there are gap between genesis and the initial state).
- [ ] RPC module (not only by range and by root requests) should be checked as a whole, what happens if non-existent state is requested? Terence suggested to emit error with a good message (e.g. "Error: node started syncing in epoch X, unable to retrieve block in epoch X - Y").
- [ ] It is probably worth exploring if we want to keep peers with finalized epoch lower than that of the provided initial state, and if yes, then what portion of those peers to keep. That's they will be useful for back-filling historical blocks, they are not that useful for progressing.
#### 2.1.4 `blockchain` package
- [ ] In `chain_info.go` we define (heavily used) `HeadRoot()` and `HeadBlock()` methods. When starting from arbitrary state we do not have full block available (but have block header, if necessary), so we need to check how our block-less heads will behave.
### 2.2 New components
Ideally, there should be no new components required for implementing the sync from arbitrary base state i.e. there are some places that may rely on historical states/blocks, and we need to update those, but other than that -- our existing code should handle sync w/o any issues.
## 3. Back-filling historical data
### 3.0 Overview
When node is started from an arbitrary initial state, we have the following data layout:
```mermaid
graph LR
genesis[Genesis] -->|...historical states...| init-state[Arbitrary Initial State] --> |...non-synced states...| canonical-state[Head of canonical chain]
```
All our current components (with minor refactoring), will be able to proceed and build on top of the arbitrary initial state:
```mermaid
graph LR
genesis[Genesis] -->|...historical states...| init-state[Arbitrary Initial State]
subgraph one[Sync to the head]
init-state --> |...non-synced states...| canonical-state[Head of canonical chain]
end
```
Now, the tricky part: syncing back historical blocks (and then regenerate historical states), all this w/o any interference into the normal sync of un-synced states.
```mermaid
graph LR
subgraph two[Sync historical blocks]
genesis[Genesis] -->|...historical states...| init-state[Arbitrary Initial State]
end
init-state --> |...non-synced states...| canonical-state[Head of canonical chain]
```
Now, since arbitrary state will be within the weak subjectivity period, and will contain block header of finalized block (the whole point of being w/i WS period is to make sure that if there is some other finalized block, then 1/3 of validators will get slashed), we can go back in history and find immediately preceding finalized block, and then the one before it, and so forth. This is possible because each finalized block has enough information about its parent block, which can then be retrieved.
After the back-filling procedure, beacon DB should be in exact same state as if sync was from genesis, w/o any intermediary initial state (assuming that there weren't any forks -- as when going back we already know the canonical chain).
Note: all the historical data, like `block_roots`, `state_roots`, and `historical_roots` is already there within the provided initial state even before any back-filling takes place, and **will not** be overwritten by history back-filling procedure, as we are going backwards.
### 3.1 Changes required to the existing components
Ideally, no changes to the existing components will be required (when it comes to back-filling procedure). Our sync process should have been updated to start from the arbitrary state, and should be not interfered with the back-filling procedure, at all.
### 3.2 New components required
In order to allow back-filling procedure have as little impact on other running components as possible, we should introduce a separate component: `historical-sync` or `backward-sync`.
If possible, that new component should rely on the very same abstractions that `initial-sync` is using (queue, FSMs), as they proved to be very robust when it comes to syncing from peers with possible incomplete data.
When going back we have an advantage of knowing which block to include, and which shouldn't go into DB. So, staring from our initial state's `parent_root` (from state's known block header), we can request a block with a given root, and then block with `parent_root` of that block, and so forth i.e. start from state's known finalized block, and get back to the previous finalized block, and so on, up until genesis:
```mermaid
graph RL
subgraph b4["(n-4)"th block]
further[...]
end
subgraph b3["(n-3)"th block]
b3_root[root]
b3_parent_root[parent_root] -->further
end
subgraph b2["(n-2)"th block]
b2_parent_root[parent_root]-->b3_root
b2_root[root]
end
subgraph b1["(n-1)"th block]
b1_root[root]
b1_parent_root[parent_root]-->b2_root
end
subgraph state
block_header[block_header.parent_root]-->b1_root
end
```
Of course, pulling blocks one by one is very inefficient, so we will still rely on fetching blocks by range, but further filter those batches to make sure that only chain of finalized blocks is processed (in backwards fashion). That's system will save block only if its root matches the expected root of the previously saved **finalized** block.
Once all blocks are pulled into our database, it is time to regenerate all the intermediary states. The last state generated, will be compared with the provided initial state, and it is expected that those do match!
**Action items:**
- [ ] Introduce new `historical-sync` service.
- [ ] We need to track earliest known finalized checkpoint (as we will be pulling its parent block when going backwards in history), so probably introduce `earliestFinalizedCheckpoint`.
- [ ] Adapt FSMs queue to proceed in backward direction.
- [ ] Allow to quickly filter out non-matching blocks from block batches (we will pull data using blocks by range requests, but then will make sure that we save data in backwards direction, checking `block(n).parent_root == block(n-1).root`).
- [ ] Make sure that all back-filling routine runs in complete isolation, the only external effect is blocks data is being updated in database, and various checkpoints to keep track of the earliest unknown block are updated.
- [ ] If those checkpoints are updated, then node should survive restarts w/o any issue.
- [ ] Make sure that back-filling is always ends either in genesis block or in terminal error if genesis block is unreachable (that's we arrive to 0th slot, and our earliest block doesn't match the genesis).
- [ ] Make sure that as soon as some block range is pushed into database the `BeaconBlocksByRange` can return them via RPC.
- [ ] Once all blocks are available (and all of them finalized), we can regenerate history states.
## 4. References
- [Weak Subjectivity in Eth2.0](https://notes.ethereum.org/@adiasg/weak-subjectvity-eth2) by Aditya Asgaonkar.
- [Shipping With Genesis States](https://hackmd.io/@prysmaticlabs/shipping_with_genesis) by Preston Van Loon
- [Weak Subjectivity: implementation roadmap](https://hackmd.io/V9vY48R6QHCN0whHbPjk9g?view) by Victor Farazdagi