IAL v2 - HackMD

# IAVL v2 Design Highlights Disk: - Separate tree per store key - Separate db per store key (unlocks concurrent writes) - Separate tree/leaf per db (unlocks concurrent writes) Key format: - same as IAVL v1 (version, sequence) - version is block height, sequence is insertion order in version _n_. At every block commit: - Hash tree, write root node with hash. - Write leaf nodes, write leaf deletes. This forms a WAL for replay. - Determine if tree should checkpoint, if yes write all dirty tree nodes. - The above means that dirty tree nodes *must* remain in memory! - When the tree is checkpointing version _n_, and the last checkpoint was version _m_, only nodes with _version > m_ are present in the version _n_ checkpoint. Cache: - IAVL v2 doesn't have a secondary cache; the tree stored in memory is the cache. - Eviction policy is based on layers/height and memory estimation. - Cache close to ceiling with dirty nodes triggers a checkpoint. Replay: - To load a tree at version _n_, load the most recent checkpoint m where _m < n_ then replay the leaf changes since that version. - When not running in archive mode, historical proof support replays the tree to provide a proof. Prune: - Remove _orphaned_ tree nodes from old checkpoints. A node is orphaned in version _n_ when it no longer has any references in that version of the tree. Archive mode (not yet implemented) - Same as above but checkpoint at every version. - Therefore no need for replays between checkpoints ### Why SQLite? - WAL mode supports many readers with a single concurrent writer - BTrees faster than LSM on read and IAVL workloads are read heavy - Bulk writes are very fast (profiled faster than leveldb)