Persistent caching

# Persistent caching ## Summary Being able to save to disk and reload (cheaply) ## 2025-06-11 High-level strategy: * https://github.com/nikomatsakis/salsa/tree/serialization * mark the tracked structs as persistable (serializable may be useful independently...) * and list out the things you may serialize when creating the database * eagerly add those ingredients so we know the indices don't change * panic somewhere or other if an ingredient that may be serialized is not in that list * storing/loading the data is relatively straightforward then except for * how do you manage inputs? * how do you remove intermediate nodes you don't want to serialize? * etc * this must be explored ## 2025-01-06 Goal: * `db.serialize_to(...)` * `let db = RustAnalyzerDb::deserialize_from(...)` * given some kind of source (e.g., a path) where previous state was serialized * it will deserialize a minimum amount of work and lazilly deserialize as existing items are accessed In rust-analyzer, we populate the defmap from here: https://github.com/rust-lang/rust-analyzer/blob/bfb81275fb746dadb7664831d7d7611fd72cc955/crates/ide-db/src/prime_caches.rs#L67-L72 With Salsa 3.0, this is basically what we have-- * an input `crate_graph` that references other inputs (`Crate`) * a tracked function `crate_def_map(CrateId) -> DefMap` * a tracked struct [`DefMap`](https://github.com/rust-lang/rust-analyzer/blob/979e3b54f70f6f231c117a5d628b98106e5c7d31/crates/hir-def/src/nameres.rs#L105-L137) * the only "external" things it uses are salsa structs (inputs, interned, tracked) ```rust #[salsa::tracked(serialize)] fn crate_def_map(db: &dyn crate::Db, krate: Crate) -> DefMap { } #[salsa::tracked(serialize)] struct DefMap { ... // anything in here has to be serializable } #[salsa::input(serialize)] struct Crate { ... } ``` * API idea * You tag the type as `serialize` when they are declared, which will generate the `Serialize` impl (as above) * You can "serialize" given a set of starting input roots * and a set of types LS (serializable jars, e.g., salsa structs, tracked functions, etc) that may be serialized. we need it because we need to do a `type_of(serializedType)` during deserialization. * all (de)serializable elements need to be Salsa structs (or impl `salsa::Update`) * transitive tracked fns *also* need to be called. * this list needs to be given to the database at the time of the database's creation. * assertion failure if any type needs to be serialized that is not in LS * You "deserialize" by giving that same set LS and you get back the set of roots * Database creation * When a salsa structure tagged as serialize is added to the database: * serializable ingredients * might want a static way to find the ones that are unexpected * Serialization * User provides a set of salsa inputs that are the "roots" of serialization * Salsa will serialize * Let `SalsaStructsToBeSerialized` be the roots of serialization * Until fixed point is reached: * Extract an id ID from SalsaStructsToBeSerialized * Serialize ID, including attached memos * for everything serializable, serialize the memo * which will serialize the return type of the memo (which can be a tracked/interned struct) * Serializing a salsa struct * * Deserialization ## Open question What traits would we need and can we make them optional? Some state in rust-analyzer might not be serializable. It'd probably only want to (de)serialize [DefMap](https://github.com/rust-lang/rust-analyzer/blob/979e3b54f70f6f231c117a5d628b98106e5c7d31/crates/hir-def/src/nameres.rs#L105-L137) in rust-analyzer. - in rust-analyzer, (de)serialization is mostly the exception. - For `DefMap`, `CrateId` would likely be the ID. - `DefDatabase::crate_def_map` would need to be a tracked function. - `CrateId` would need to be (de)serializable. - [`CrateData`](https://github.com/rust-lang/rust-analyzer/blob/9aa42935947024090d423b0cec801aee59132f5e/crates/base-db/src/input.rs#L276-L294) would need to be a `#[salsa::input]`. ## 2024-10-09 How would this work? Three key parts: * View map * not serializable, populated lazily upon read/write usage. * Ingredients -- general metadata * Has some mutable state (LRU), but it can be removed. For functions, there's a free list. * Might need to be (de)serialized as well; serves as the "schema". * Table -- data for each entity * The data itself! We're pretty sure there needs to be a mechanism all the (de)serialized tracked functions and structs: this makes the persistent state contract an explicit API boundary.