owned this note
owned this note
Published
Linked with GitHub
# ADR-040 Implementation
Plan for the implementation of [ADR-040](https://github.com/cosmos/cosmos-sdk/blob/eb7d939f86c6cd7b4218492364cdda3f649f06b5/docs/architecture/adr-040-storage-and-smt-state-commitments.md). Covers:
* ABCI-streamed state-sync snapshots using BadgerDB and RocksDB backends
* Versioning using new DB backends
* Updates to LL-SMT `MapStore`
* How to decouple state commitment and storage concerns in KV stores
## Implementing ABCI snapshots for new Tendermint DB backends
### Approach
We will attempt to make all needed changes conform to the same interface used in the `Snapshot`/`Restore` methods implemented by `rootmulti.Store`. This is the only type that will need to be serialized for snapshots, hence the only type to implement the `Snapshotter` interface. Adapting this pattern for the new backends will mostly be a matter of defining the serialization units for export/import.
The most straightforward way to do this is to just define a new `SnapshotItem` message type encapsulating a KV-pair, and stream all entries from a store in this format. This will allow the SDK stores to be completely agnostic to the backend used, while the existing export/import procedure will remain mostly the same.
* Both new backends also provide native dump (or "backup") features. These could potentially be used in implementing ABCI snapshots for some performance benefit, but this would force the boostrapping node to use the same backend for the same stores as every peer node, since the data would be in a backend-specific format. So, this option can probably be ruled out.
### Interface
We can define new generic interface types to export and import encoded messages, and implement these for all relevant persistent stores. These will reflect the current IAVL export types, but deal directly with `SnapshotItem`s (or another generic type). The existing logic can be refactored accordingly.
```go
type PortableStore interface {
Export(uint64) (Exporter, error)
Import(uint64) (Importer, error)
}
type Exporter interface {
Next() (SnapshotItem, error)
Close()
}
type Importer interface {
Add(SnapshotItem)
Commit() error
Close()
}
```
### Implementation
#### BadgerDB
The `NewTransactionAt` provides a view of the DB at a specific version. In "managed" mode, the version (timestamp) can be an arbitrary integer.
* `CommitAt` also allows restoring state at a past version, although we seem to have no use case for this as of yet.
#### RocksDB
RocksDB [snapshots](https://github.com/facebook/rocksdb/wiki/Snapshot) are an ephemeral view of the DB at a specific point in time. This is enough to create a snapshot directly after a commit.
* If we want to support creation from a past block height, that can be implemented with persistent copy-on-write _checkpoints_ (which will be needed for versioned queries as well; [see below](#versioning)).
#### Encoding
We define the following new protobuf message and add it as a subtype of `SnapshotItem`:
```protobuf
message SnapshotKVItem {
bytes key = 1;
bytes value = 2;
}
message SnapshotItem {
oneof item {
...
SnapshotKVItem kv = 3;
}
}
```
The `Exporter`/`Importer` implementations for BadgerDB and RocksDB both stream instances of `SnapshotKVItem`. The `IAVL` handling code can also be wrapped so that the multistore code only has to deal with this generic interface.
## Interface for historical versions <a id="versioning"/>
Tendermint DB will need to expose an interface for accessing and viewing historical versions of records. The simplest way to do this may be to supply a new interface type representing a read-only DB, and a new method like `DB.AtVersion(uint64) (ReadOnlyDB, error)`. It may also be convenient to provide a `DB.GetAt([]byte, uint64)` method to access a single record.
This can implemented like so:
* BadgerDB in managed mode allows arbitrary integers to be used as "timestamp" versions. `DB.NewTransactionAt()` is used to view DB state at a version.
* For RocksDB, a [`Checkpoint`](https://github.com/facebook/rocksdb/wiki/Checkpoints) can be created for each commit in a path corresponding to the version integer, and then opened as a read-only copy of the DB.
### Batch support
If we want to implement efficient batched writes to the SMT, that could be supported in the following way. The tree will need to be able to read as well as write to the batch itself while building it, using something closer to a transaction object.
* RocksDB has a [Write Batch With Index](https://github.com/facebook/rocksdb/wiki/Write-Batch-With-Index) utility which allows this.
* BadgerDB's `WriteBatch` does not support `Get`s, but, as it is a relatively simple wrapper around a transaction, we could just use `Txn`.
These writable batch types would be wrapped in the `MapStore` interface. The SMT would need a new interface type to represent the batch (e.g. `WriteBatch`). This would wrap a derived `SparseMerkleTree` object using the transaction object as its `MapStore`. All normal SMT operations would then be available on the batch.
### Versioning and snapshots
Creating state sync snapshots is [not required](https://github.com/cosmos/cosmos-sdk/pull/8430#discussion_r626879836) for SMT data.
Versioning may not be needed either; however, if it is desired, it could be implemented with a `MapStore.GetAt([]byte, uint64)` method, which forwards to a `DB.GetAt()` method as described [above](#versioning).
## Decoupling SC and SS `KVStore`s
In the state machine, we need to settle on the pattern used to decouple the state commitment and storage concerns. From the ADR-040 proposal, the store buckets consist of:
>1. SC: `key -> hash(key, value)`: for state commitment we store only the hash(key, value) as the value in the leafs of the state commitment tree.
>2. SS-B1: `key → value`: the principal object storage, used by a state machine, behind the SDK `KVStore` interface: provides direct access by key and allows prefix iteration (KV DB backend must support it).
>3. SS-B2: `hash(key, value) → key`: an index needed to extract a value (through: B2 -> B1) having a only a Merkle Path.
A few options for how implement this are outlined below. The best course of action will depend on the order of priorities.
If we want to prioritize SDK user experience, we should wrap this logic in a single `KVStore` interface which exposes storage data as KV pairs and only exposes SC data as Merkle root hashes (option #1 below). This only makes sense if users are not expected to need to access or iterate over hashes of individual records or the inverted index mappings. (This is the approach taken in the open [SMT store PR](https://github.com/cosmos/cosmos-sdk/pull/8507).)
If the higher priority is to provide SDK users full visibility and control of all component KV buckets, these should be left as loosely coupled as possible (#2). The SC, SS, and index buckets will only be used in tandem at the multi-store level. The multi-store will then expose these as separate `CommitKVStore`s, so that the user is still able to unambiguously access all underlying data, including data and hashed records, and wrap each store in a `tracekv.Store`, `gaskv.Store` or another layer. However, if the multi-store is planned to be removed in the future, users will then be responsible for managing the new state machine semantics unless a new wrapper type is created.
We suggest a third option if all of the above are considered high priorities: introduce a new interface which expands on `KVStore` (#3). Ideally, this will make as much as possible of the underlying store structure available for tooling, while still presenting an SDK interface that is friendly to users.
### 1. Write a unified `KVStore` that generates bucket entries and directs writes/reads to both SS and SC.
Pros:
1. Fewer changes required at the multistore level and above to support the decoupling
2. Unified store interface that can map to a StoreKey
* Multistore exposes a single `CommitKVStore` for each substore
* Presents a single Merklized store interface for SDK users
Cons:
1. Can't individually wrap (gas meter, prefix, listen, trace) each of the underlying stores
2. `Set` and `Delete` have side effects, rather than simply setting or deleting the provide KV pair they generate two other KV pairs to set/delete
3. Return value for some methods is ambiguous, e.g. which of the buckets is the returned iterator iterating over
### 2. Maintain separate `KVStore`s for SC and SS
Pros:
1. Can individually wrap (gas meter/listen to/trace) each bucket
2. `Set` and `Delete` operations remain pure
3. No method result ambiguity
Cons:
1. Can't map to/differentiate differentiate the KVStores with a single StoreKey
* Multistore needs a way to expose each decoupled component of its substores
2. We need logic at a higher level (i.e. multistore) for generating entries and directing reads/writes to the separate SC and SS KVStores.
* Will require more changes at the multistore level and above to accommodate
3. There is talk about deprecating the multistore - if the multistore is deprecated, SDK users would be responsible for managing each store
### 3. Create a new KVStore subtype
With option #1 above, the primary issue is not being able to wrap the underlying stores. With #2, the primary issue is that we disrupt the current 1-to-1 mapping of `StoreKey` to a KVStore. We can avoid both by defining a new interface and some supporting types:
Pros:
* Unified KVStore interface that maintains 1-to-1 mapping with StoreKey
* Enables wrapping of underlying KVStores
* More explicity types the new behavior of the unified store
Cons:
* Requires refactoring of wrapping code in multistore and above
* New `StoreKind` enum
* Possibly confusing to module developers?
Example synopsis:
```go=
// Wrapper wraps the receiver's underlying KVStores of the specified kind with the provided KVStore
type Wrapper interface {
Wrap(k KVStoreKind, w KVStore)
}
// StateStore is a KVStore that directs reads/write from/to an underlying state commitment, state storage, and state index KVStores
type StateStore interface {
KVStore
Wrapper
}
// Store is an example StateStore, each bucket is a KVStore that can be independently wrapped
type Store struct {
// Direct KV mapping (SS)
ss types.KVStore
// Inverted index of SC values to SS keys
ii types.KVStore
// State commitments layer, LL-SMT KVStore
sc types.CommitKVStore
}
// KVStoreKind enum used to differentiate between the different kinds KVStores underlying a StateStore
// StoreKey remains mapped 1-to-1 with a StateStore, these distinguish between the different substores.
type KVStoreKind int
const (
StateCommitment KVStoreKind = iota
StateStorage
InvertedIndex
)
```
For all purposes other than wrapping, we can use the `StateStore` as any other `KVStore`. Methods that need to wrap an underlying store *after* loading it into the `StateStore` will need to use the new methods (e.g. `MultiStore.GetKVStore` when tracing or listening is enabled).