Canonical ledger state

# Canonical ledger state Paul Clark, April 2025 ### Introduction Mithril is used to distribute data snapshots signed by nodes representing sufficient stake to make them trustworthy. The primary use of this is currently fast bootstrap of nodes to save having to download all blocks using the Ouroborous mini-protocols. Because Mithril collects multiple signatures for data produced by different nodes, all the signers must see exactly the same data, down to the individual byte representation level. Currently, Mithril can only rely on the exact representation of blocks and transactions, which are stored in exactly the same format as they appear on chain, and hence every node will produce the same data snapshot (the "immutable-db" files). These can be used to download the entire history of the chain without having to use the chain-sync and block-fetch protocols, but the receiver still has to replay all the transactions from the beginning to derive the current ledger state at the time of the snapshot, which allows it to continue to sync from there on. This can take between an hour and several days depending on the receiving client. ### Needed improvement It would be better if a bootstrapping node or other client could fetch a reliable snapshot of the current ledger state (the entire dynamic state of the chain) which would save it having to replay all the transactions. Such a snapshot should also be considerably smaller (the current immutable-db snapshot is around 55GB compressed). Mithril does also offer a ledger state snapshot in the downloaded package, but this cannot be signed by the decentralised set of signers because there is no standard ("canonical") format for how the state is represented which would guarantee that two nodes correctly synced to the same block will produce exactly the same output. Also, there isn't currently any agreement of *when* the snapshots will be produced - it is too much to expect a node to produce one for every block - so we can't guarantee a meaningful sample of snapshots at any particular block height. Although the Mithril service produces a snapshot and gets the signers to sign a *digest* of it, this is a centralised mechanism which goes against the philosophy of Cardano and requires the client to trust the service. ### Summary requirement So the requirement is a standardised file format (or formats) which deterministically and implementation-independently defines the byte-level representation of the ledger data snapshot, which contains all the data required to bootstrap a node from an arbitrary position in the chain, and a simple deterministic way to trigger the snapshot which works the same for all nodes, independently of their local clock. The file format should be reasonably compact (although note that Mithril compresses files in any case), defined by a strong schema notation and capable of encoding in a deterministic form which removes any variation in ordering of values, number encoding formats and so on. Ideally it would use standard tools and formats already used in Cardano. We propose deterministically encoded CBOR ([RFC8949 Sec 4.2](https://datatracker.ietf.org/doc/html/rfc8949#section-4.2), with the tighter profile defined in [dCBOR](https://datatracker.ietf.org/doc/draft-mcnally-deterministic-cbor/)) with CDDL schemas as the base format and tooling. ### Additional use cases Such a "canonical ledger format" also delivers more benefits. Firstly, it can be used as a format to represent the start and end states for state transition tests - both internally for development and potentially as part of a conformance test suite. Secondly, to improve confidence and safety where there are multiple node implementations, there is a proposal to include a ledger state hash in the block header (see CIP#xx) so that any divergence in state between implementations can immediately be detected. The canonical ledger state format can serve as the preimage for this hash. Thirdly, there is the potential to include elements of the ledger state in a Merkle tree, allowing compact proof that a particular element had a particular state at the time of the snapshot. Without specifying this mechanism in detail here, it requires that we can identify an element of the ledger state with a path string such as JSON Pointer [RFC 6901](https://www.rfc-editor.org/rfc/rfc6901). In particular, where an element is likely to be referenced by ID (e.g. UTXO Tx:Index pair, or SPO ID), they should be stored as a map (JSON object) rather than an array of tuples. ### Snapshot contents The essential required data in a snapshot includes: * UTXOs * Stake delegation * Reward accounts * Protocol parameters * Accounting pots (reserves, deposits etc.) * SPO state * DRep state * Governance action general state * Governance action voting state * Header state (e.g. nonces) It could also be useful to produce derived data that is commonly useful: * Payment address balances * Stake address balances * Stake pool delegation distribution ### Timing and load How often should the snapshot be produced? At the limit it would clearly be unreasonable to expect a node to produce a full snapshot at every block. At the other end of the scale, it really needs to be done at least once per epoch to generate useful delegation and governance data. The tradeoff is between the cost of producing the snapshot and the time and relay node capacity needed for a client to catch up from an old snapshot. Since Mithril typically produces a snapshot every four hours, this would seem like a reasonable frequency to produce snapshots as well, which would not put too much load on the node - particularly as it can be written asynchronously, so long as there is an atomic snapshot taken in memory. The current node does this at every block. It is also worth noting that it makes no sense to snapshot the results of volatile blocks (that is those within security parameter k=2160 of the tip). If we did, and there was a rollback which unwound the block at which we took the snapshot, the snapshot would be invalid, and we have no way of signalling its invalidity to clients, even if we wanted to push this complexity onto them. We therefore need to snapshot at the earliest when a block enters the immutable chain. Happily, this is when the current node - and probably other implementations too - will store the block into their immutable data store. We therefore need a simple repeatable algorithm which triggers a snapshot at a deterministic cadence, independently of local clock, and which only captures immutable blocks. We propose the following: ```python snapshot_interval = 4 * 60 * 60 / 20 // =720: 4hr on average, 20s per block if (block_number % snapshot_interval) == 0 && tip_number - block_number >= 2160: trigger_snapshot() ``` The snapshot should represent the ledger state *after* processing the given block number. ### Modularity The snapshot decomposes into a number of logical areas, some of which are reasonably large (UTXO, stake and rewards) while others are tiny (protocol parameters, pots). It makes sense to keep each logical area in a separate file, which has a number of benefits: * Special-purpose snapshots can be created with only a subset * Clients that require only a subset need only read some of the files * It is naturally extensible for future additions * It is easier for a modular ledger implementation to produce ### Forwards and backwards compatibility If we want to be able to update and change the ledger state snapshot format in the future, we need to indicate the version somewhere. JSON (via CBOR) is naturally forwards- and backwards-compatible when using object properties with defaults, so an extension to the format need not necessarily require version-specific code. As mentioned above, if a whole new area is created, we can simply add a new file. However, since a change to the specification could result in a different binary encoding of the same data, we need a process to manage version changes to avoid the signers signing different versions. If the change happens as the result of a hard fork (the mostly likely case) then this happens naturally as SPOs transition between protocol numbers. If there was a requirement to change the snapshot format between hard forks, updated nodes would need to produce both the old and the new format for a time, until a stake majority was signing the new format. We therefore propose the files should be placed in a subdirectory tagged with the version number (e.g. `ledger-v1`) so that both can be produced in the same snapshot if required. ### Snapshot block information and metadata For node bootstrap, we need to know the block height, slot number and block hash (together the "point") *after which* the snapshot was taken, to pass to the chain-sync protocol for ongoing synchronisation. This is kept in a block_data file included in the snapshot. It might be tempting to place other data such as the originating node's type and version in here, but this is of course Kryptonite for the requirement for it to be the same across all nodes. We have however added an unsigned, information-only JSON file for this purpose. We use JSON rather than CBOR to make it more human readable and also to flag that it is not to be relied on since it is not signed. ## File definitions Within directory `ledger-v1`: **Basic:** `block-data.cbor` `utxos.cbor` `stake.cbor` `rewards.cbor` `parameters.cbor` `pots.cbor` `spos.cbor` `dreps.cbor` `governance.cbor` `votes.cbor` **Derived:** `spdd.cbor` `payment-addresses.cbor` `stake-addresses.cbor` **Unsigned:** `metadata.json` [Link to CDDLs of each file]