owned this note
owned this note
Published
Linked with GitHub
# Salsa 2020
Idea: It seems like we've accumulated some experience and goals thanks to rust-analyzer. Let's talk about how to reimagine Salsa and make it better.
## Random links and notes
* wycats's [Everafter](https://github.com/wycats/everafter/) might have some relevant ideas
## Niko's goals
* Integrate salsa/chalk queries and accommodate cycles
* including negative reasoning, which is a bit tricky
* Simplify parallel handling
* "Idempotent" queries as the default
* Move the cycle handling to some "side table" checks to accomodate "synchronized" queries
* [Earlier proposal](https://hackmd.io/FCrUiW27TnKw3MTvRtfPjQ) described idempotent queries and recursive solver but overlooked negative cycles
* still the same basic approach I have in mind
## Matklad's complaints
* Most things are arked, the storage is typically `HashMap<Id, Arc<Thing>>`. All but a couple of queries in rust-analyzer uses intrned ID as a key, so `Vec<Thing>` layout seems more desirable.
* Question: are those IDs *dense*? (yup)
* seems like they would accumulate "holes" potentially, maybe you don't care
* User is reponsible for managing storage. "Child" entities are embedded in the "parent" entities. For example, crate_def_map holds modules, sturts hold fields. Managing storage takes a fair amount of code, it would be cool to have ECS-style uniform storage, when you don't have to manually add fields to parent entities.
* sometimes it's more convenient to "push" entities. For example you walk the struct and "create" new struct fields. At the moment, rust-analyzer requires you to "pull" entities -- a "struct field" is a "struct + local index", so to get a field one gets the struct and then looksup a field. A more convenint programming model would be to lookup in a single step an already "created" field.
* rust-analyzer at the moment doesn't use parallel queries.
* obvious candidate is `crate_def_map` -- we want to process dependenceis in parallel
* parallel is at odds with lazy -- to partition the work, you need to build the worklist upfront. It might turn out that some work is unnecessary
* Compile times: rust-analyzer uses `dyn DB` everywhre, but this is awkward (manual upcasting) and compile times stil feel pretty slow.
* unclear if there's any specific problem here beyond "there's a lot of code"
* but it seems like salsa monomorphises a lot of things: https://github.com/salsa-rs/salsa/issues/220
* the end goal here is separate compilation. The db-defning crate should expose fully non-generic interface, to make sure that (with perfect incremental compilation) only re-linking is required if the set of queries itself is not modified.
* GC story is unknown: rust-analyzer doesn't do GC beyound LRU for syntax and token trees
* There's a desire to be able to fork/transiently mutate the database.
* For things like documentation comments, we want to run analysis on `` ```rust `` snippets at the IDE layer, without teaching the compiler that there's certian semantics of doc comments
* For completion, we want to insert a fake identifier at the cursor position (to be able to get the type of would-be inserted identifier, etc)
* Ideally, we want to be able to call `set` without cancelling other in progress queries.
Relative Priority, from highest to lowest:
1. Separate Compilation/non-generic interface (compile times are already painful, and won't get better)
2. Parallelism (affects programming model)
3. Everything else
# 2020.06.22
Some notes.
---
## Embedding of data structures
a struct field is kind of an index and a StructId:
```rust
struct StructFieldId {
id: StructId,
index: usize,
}
// Getting the type gives a lot of boilerplate:
db.struct_info(field.id).field(field.index).ty
// Removing the boilerplate could be done like so:
impl StructFieldId {
fn ty(&self, db: &dyn Db) {
db.struct_info(self.id).field(self.index).ty
}
}
field.ty(db)
```
* but now you have to define the boilerplate
* and sometimes you have to move those things
* very common pattern is that you have to compute the *names* of things up front but the details can be deferred
* e.g., the types of the field
* can we move in a more ECS-like direction
* is this connected to push vs pull
* we would sort of like to write code that walks over the structs and "creates" the set of fields that later code can use
* the rule is that entity ids cannot be used across edits
* which, if you think about it, the only thing that logically makes sense
* like, if there have been edits, how do I know the entity even still exists?
* lazy computations -- how convenient can we make it
## compilation times
* Upcasts are awkward
* what else makes dyn awkward? just having to write dyn
* Compilation times -- [salsa does seem to monomorphize a lot](https://github.com/salsa-rs/salsa/issues/220)
```rust
pub(super) struct Slot<DB, Q, MP>
where
Q: QueryFunction<DB>,
DB: Database + HasQueryGroup<Q::Group>,
MP: MemoizationPolicy<DB, Q>,
{
key: Q::Key,
state: RwLock<QueryState<DB, Q>>,
policy: PhantomData<MP>,
lru_index: LruIndex,
}
```
* `TyCtxt<'tcx>`
* `type IdeDb<'a> = &'a dyn IdeDbTrait;`
* matklad points out that this is a conceptual burden, you have to explain "why the dyn", it'd be simpler to have a more "flat", struct-like interface
* Limitations of salsa's hierarchical model
* You can't define a database that has methods that it uses but doesn't have the definition
* You can have dependencies on another crate, but then it provides the dependencies
* This is apparent in chalk
* (You can do this in salsa, but not natively)
* Parser query definition
* Parser query implementation
* Type checker query definition
* Type checkery query implementation
```rust
trait ParserQueries {
...
}
#[salsa::database]
trait ParserQueriesDB: ParserQueries {
}
#[salsa::database]
trait TypeCheckerQueries: ParserQueries {
}
impl<DB: ParserQueriesDB> ParserQueries for DB {
...
}
```
## Parallelism
* Idempotency
* Intra-query:
* futures (not `async` futures necessarily)
* actually prefetch is perhaps the model we want
* parallel iterator
## Actual steps
* implementing a `prefetch` operation
* modeling the "interface" trait / "implementation" pattern and seeing if we can make it nicer and what the obstacles are
* adding entities to salsa / re-orienting around vectors with "set" operations
* wycats's [Everafter](https://github.com/wycats/everafter/) might have some relevant ideas